Workshop Data¶
Dataset Overview¶
This workshop uses a soil microbiome dataset collected from multiple facilities across a time series. Samples target the V4 hypervariable region of the 16S rRNA gene and were sequenced on Illumina using paired-end chemistry.
The data are split across two sequencing runs (run 2 and run 3), which are imported separately and merged during the DADA2 denoising step.
Key Files¶
| File | Description |
|---|---|
manifest_run2.txt |
Import manifest for sequencing run 2 |
manifest_run3.txt |
Import manifest for sequencing run 3 |
reads_run2/ |
Demultiplexed paired-end FASTQ files for run 2 |
reads_run3/ |
Demultiplexed paired-end FASTQ files for run 3 |
metadata_q2_workshop.txt |
Sample metadata (tab-separated) |
metadata_q2_workshop_noECs.txt |
Metadata with environmental controls removed (used in ANCOM-BC) |
2024.09.backbone.v4.nb.qza |
GreenGenes2 Naive Bayes taxonomic classifier (V4 region) |
2022.10.backbone.sepp-reference.qza |
GreenGenes2 SEPP reference for phylogenetic placement |
tree_gg2.qza |
Pre-built phylogenetic tree (use if you skip the SEPP step) |
Metadata Columns¶
| Column | Description |
|---|---|
sample_type |
Sample type (e.g., soil, control) |
facility |
Collection facility |
add_0c |
Accumulated degree days at time of collection (time + temperature proxy) |
host_subject_id |
Individual subject identifier for longitudinal tracking |
host_subject_id_sample_type |
Combined subject + sample type identifier |
type_days |
Combined sample type and day grouping variable |
Downloadable Files¶
Workshop files are hosted on Alpine and copied to your working directory throughout the tutorials. See the Downloads page for any files made available for local download.
Manifest File Format
QIIME2 manifest files are tab-separated with columns sample-id, forward-absolute-filepath, and reverse-absolute-filepath. The PairedEndFastqManifestPhred33V2 format expects Phred+33 quality encoding, which is standard for modern Illumina data.