Skip to content

Workshop Data

Dataset Overview

This workshop uses a soil microbiome dataset collected from multiple facilities across a time series. Samples target the V4 hypervariable region of the 16S rRNA gene and were sequenced on Illumina using paired-end chemistry.

The data are split across two sequencing runs (run 2 and run 3), which are imported separately and merged during the DADA2 denoising step.

Key Files

File Description
manifest_run2.txt Import manifest for sequencing run 2
manifest_run3.txt Import manifest for sequencing run 3
reads_run2/ Demultiplexed paired-end FASTQ files for run 2
reads_run3/ Demultiplexed paired-end FASTQ files for run 3
metadata_q2_workshop.txt Sample metadata (tab-separated)
metadata_q2_workshop_noECs.txt Metadata with environmental controls removed (used in ANCOM-BC)
2024.09.backbone.v4.nb.qza GreenGenes2 Naive Bayes taxonomic classifier (V4 region)
2022.10.backbone.sepp-reference.qza GreenGenes2 SEPP reference for phylogenetic placement
tree_gg2.qza Pre-built phylogenetic tree (use if you skip the SEPP step)

Metadata Columns

Column Description
sample_type Sample type (e.g., soil, control)
facility Collection facility
add_0c Accumulated degree days at time of collection (time + temperature proxy)
host_subject_id Individual subject identifier for longitudinal tracking
host_subject_id_sample_type Combined subject + sample type identifier
type_days Combined sample type and day grouping variable

Downloadable Files

Workshop files are hosted on Alpine and copied to your working directory throughout the tutorials. See the Downloads page for any files made available for local download.

Manifest File Format

QIIME2 manifest files are tab-separated with columns sample-id, forward-absolute-filepath, and reverse-absolute-filepath. The PairedEndFastqManifestPhred33V2 format expects Phred+33 quality encoding, which is standard for modern Illumina data.