Day 3¶

Full Cow Dataset Pipeline

Today is the all-cow day: a guided, hands-on run-through of every step you learned across Days 1–2, applied end-to-end to the cow dataset. You will not see new concepts here, instead you'll consolidate the workflow by driving it yourself from raw reads through diversity and differential abundance, with two instructor-led check-ins along the way.

How to use this page

Each Phase below is a numbered chunk of work with the condensed commands you need. If you get stuck on the why behind a step, jump back to the corresponding Day 1 or Day 2 page (linked in each phase). The two ⏸ Check-in sections are deliberate pause points, wait for the group before proceeding.

Phase 0: Setup¶

Request an interactive session and load QIIME2:

sinteractive --reservation=microbiome --ntasks=4 --time=06:00:00

module purge
module load qiime2/2026.1_amplicon

Make a fresh working directory for the cow run:

mkdir -p /scratch/alpine/$USER@colostate.edu/cow_pipeline
cd /scratch/alpine/$USER@colostate.edu/cow_pipeline

Cow data location

The cow dataset (manifest, reads, metadata) lives in the workshop class folder on Alpine. Your instructor will share the exact path, copy the manifest, reads, and metadata file into your working directory before starting Phase 1.

Intro: Cow Dataset Overview¶

Before running anything, your instructor will walk through:

Study design, what samples were collected, how, and why
Metadata columns, which variables you'll use as grouping factors
Sequencing setup, primers, region, run structure (single run vs. multi-run merge)

Open the metadata file and the manifest in a text editor and skim them so you know what you're working with.

Phase 1: Import, Demultiplex & Sequence Quality¶

Reference: Day 1: Importing Sequences

Import the paired-end manifest as a SampleData[PairedEndSequencesWithQuality] artifact:

qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path manifest_cow.txt \
  --input-format PairedEndFastqManifestPhred33V2 \
  --output-path demux_cow.qza

Summarize per-sample read counts and per-base quality:

qiime demux summarize \
  --i-data demux_cow.qza \
  --o-visualization demux_cow.qzv

Open demux_cow.qzv on view.qiime2.org. Look at:

Per-sample read counts, are any samples extremely shallow?
Per-base quality plots, where does the forward read quality crash? The reverse?

You'll use these to pick truncation lengths in Phase 2.

Stop here

Don't move into DADA2 yet. We regroup as a class for Check-in #1 before continuing.

⏸ Check-in #1: Outputs & Choosing Quality Thresholds¶

Group regroup. Bring your demux_cow.qzv open. Be ready to discuss:

What forward and reverse truncation lengths are you considering, and why?
Are there samples you would drop before DADA2? On what evidence?
How does this run's quality compare to the Day 1 dataset?

This is also the right time to flag any errors from Phase 1 before they propagate.

Phase 2: DADA2 Denoising, Quality Checks & Taxonomy¶

Reference: Day 1: Denoising with DADA2, Day 1: Taxonomic Classification

Run DADA2 with the truncation lengths you settled on at Check-in #1:

qiime dada2 denoise-paired \
  --i-demultiplexed-seqs demux_cow.qza \
  --p-trunc-len-f <FWD> \
  --p-trunc-len-r <REV> \
  --p-n-threads 4 \
  --o-table table_cow.qza \
  --o-representative-sequences seqs_cow.qza \
  --o-denoising-stats dada2_stats_cow.qza

Visualize denoising stats and the feature table summary:

qiime metadata tabulate \
  --m-input-file dada2_stats_cow.qza \
  --o-visualization dada2_stats_cow.qzv

qiime feature-table summarize \
  --i-table table_cow.qza \
  --m-sample-metadata-file metadata_cow.txt \
  --o-visualization table_cow.qzv

Filter long amplicons before classification

The cow dataset can carry occasional long off-target amplicons that wreck classifier behavior. After DADA2, filter representative sequences to your expected V4 amplicon length range before passing them into the classifier:

qiime feature-table filter-seqs \
  --i-data seqs_cow.qza \
  --m-metadata-file seqs_cow.qza \
  --p-where 'length(sequence) BETWEEN 200 AND 260' \
  --o-filtered-data seqs_cow_filtered.qza

Classify the (filtered) ASVs against the GreenGenes2 V4 classifier:

qiime feature-classifier classify-sklearn \
  --i-reads seqs_cow_filtered.qza \
  --i-classifier 2024.09.backbone.v4.nb.qza \
  --o-classification taxonomy_cow.qza

qiime metadata tabulate \
  --m-input-file taxonomy_cow.qza \
  --o-visualization taxonomy_cow.qzv

Generate a taxa bar plot for a quick visual sanity check:

qiime taxa barplot \
  --i-table table_cow.qza \
  --i-taxonomy taxonomy_cow.qza \
  --m-metadata-file metadata_cow.txt \
  --o-visualization taxa_barplot_cow.qzv

⏸ Check-in #2: DADA2 Outputs & Taxonomy QA¶

Bring dada2_stats_cow.qzv, table_cow.qzv, and taxa_barplot_cow.qzv open. Discuss:

What fraction of input reads survived DADA2? Any sample-level outliers?
Does the taxonomy bar plot look biologically plausible (expected dominant phyla, low host contamination)?
How many reads per sample do you have to work with, and what does that suggest for a rarefaction depth?

Phase 3: Rarefaction & Phylogenetic Tree¶

Reference: Day 2: Rarefaction, Day 2: Phylogenetic Tree

Generate alpha rarefaction curves to confirm your sampling depth choice:

qiime diversity alpha-rarefaction \
  --i-table table_cow.qza \
  --m-metadata-file metadata_cow.txt \
  --p-max-depth <PICK_DEPTH> \
  --o-visualization alpha_rarefaction_cow.qzv

Kick off the SEPP tree job, it takes a while, so start it as soon as you can and let it run in the background while you continue:

qiime fragment-insertion sepp \
  --i-representative-sequences seqs_cow_filtered.qza \
  --i-reference-database 2022.10.backbone.sepp-reference.qza \
  --p-threads 4 \
  --o-tree tree_cow.qza \
  --o-placements placements_cow.qza

If the tree job fails or runs long

Your instructor has a pre-built tree_cow.qza they can hand out instead. Don't block on this, start it, then move on, and pick up the pre-built tree if your job hasn't finished by the time you need it in Phase 4.

Phase 4: Core Metrics & α-Diversity¶

Reference: Day 2: Alpha Diversity

Compute the core diversity metrics at your chosen sampling depth:

qiime diversity core-metrics-phylogenetic \
  --i-phylogeny tree_cow.qza \
  --i-table table_cow.qza \
  --p-sampling-depth <DEPTH> \
  --m-metadata-file metadata_cow.txt \
  --output-dir core_metrics_cow

Test α-diversity differences across your primary grouping variable:

qiime diversity alpha-group-significance \
  --i-alpha-diversity core_metrics_cow/shannon_vector.qza \
  --m-metadata-file metadata_cow.txt \
  --o-visualization core_metrics_cow/shannon_group_significance.qzv

qiime diversity alpha-group-significance \
  --i-alpha-diversity core_metrics_cow/faith_pd_vector.qza \
  --m-metadata-file metadata_cow.txt \
  --o-visualization core_metrics_cow/faith_pd_group_significance.qzv

Phase 5: β-Diversity¶

Reference: Day 2: Beta Diversity

Test community-level differences with PERMANOVA:

qiime diversity beta-group-significance \
  --i-distance-matrix core_metrics_cow/unweighted_unifrac_distance_matrix.qza \
  --m-metadata-file metadata_cow.txt \
  --m-metadata-column <GROUP_COLUMN> \
  --p-pairwise \
  --o-visualization core_metrics_cow/unweighted_unifrac_group_significance.qzv

Open core_metrics_cow/unweighted_unifrac_emperor.qzv to view the PCoA. Color by your grouping variables, does separation in the ordination match the PERMANOVA p-values?

Phase 6: Alternative Analyses (ANCOM-BC2)¶

Reference: Day 2: Differential Abundance (ANCOM-BC2)

Filter the table for ANCOM-BC2 (samples below depth + rare features):

qiime feature-table filter-samples \
  --i-table table_cow.qza \
  --p-min-frequency <DEPTH> \
  --o-filtered-table table_cow_filt.qza

qiime feature-table filter-features \
  --i-table table_cow_filt.qza \
  --p-min-frequency 50 \
  --p-min-samples 4 \
  --o-filtered-table table_cow_filt_abund.qza

Collapse to species (level 7) and run ANCOM-BC2 on your primary group:

qiime taxa collapse \
  --i-table table_cow_filt_abund.qza \
  --i-taxonomy taxonomy_cow.qza \
  --p-level 7 \
  --o-collapsed-table table_cow_l7.qza

qiime composition ancombc \
  --i-table table_cow_l7.qza \
  --m-metadata-file metadata_cow.txt \
  --p-formula '<GROUP_COLUMN>' \
  --o-differentials ancombc_cow_group.qza

🏁 Wrap-up: Results Review & Troubleshooting¶

Group regroup. Bring all your .qzv files open in view.qiime2.org. We'll work through:

What does the cow data actually show? Compare across tables.
Where did people make different parameter choices, and how did that change the result?
Common pitfalls hit during the day — what would you do differently next time?

Open Q&A & Workshop Wrap-up¶

Free time with all instructors. Bring your own data questions, debugging issues, or anything from earlier in the workshop you want to revisit.

This completes the workshop. See Resources for downloads, the glossary, and the Publication Plots in R self-study reference.