Skip to content

Taxonomic Classification

We classify ASVs by comparing them to the GreenGenes2 reference database using a pre-trained Naive Bayes classifier. We then visualize taxonomic composition and remove reads that originated from host mitochondria and chloroplasts.

Get the Taxonomic Classifier

Copy the pre-trained GreenGenes2 V4 classifier:

cp /pl/active/courses/2025_summer/CSU_2025/q2_workshop_final/QIIME2/2024.09.backbone.v4.nb.qza .

About the Classifier

This is a Naive Bayes classifier trained on the GreenGenes2 (2024.09) backbone sequences trimmed to the V4 hypervariable region. It is specific to the primer set and read length used in this workshop. For your own data you will need a classifier trained on the appropriate region.

Classify ASVs

qiime feature-classifier classify-sklearn \
  --i-reads seqs.qza \
  --i-classifier 2024.09.backbone.v4.nb.qza \
  --o-classification taxonomy_gg2.qza

Inspect the Classification

qiime metadata tabulate \
  --m-input-file taxonomy_gg2.qza \
  --o-visualization taxonomy_gg2.qzv

Group Samples for Barplots

Group samples by the type_days metadata column (combined sample type + timepoint) using the mean-ceiling method:

qiime feature-table group \
  --i-table table.qza \
  --m-metadata-file metadata_q2_workshop.txt \
  --m-metadata-column type_days \
  --p-mode mean-ceiling \
  --p-axis sample \
  --o-grouped-table table_type_days.qza

Taxonomy Barplots

qiime taxa barplot \
  --i-table table_type_days.qza \
  --i-taxonomy taxonomy_gg2.qza \
  --o-visualization taxa_barplot_type_days.qzv

Remove Mitochondria, Chloroplasts, and Contaminants

Filter the grouped table:

qiime taxa filter-table \
--i-table table_type_days.qza \
--i-taxonomy taxonomy_gg2.qza \
--p-exclude \
--o-filtered-table table_type_days_nomitochloro.qza
Open taxa_barplot_type_days.qzv and look for taxa that are host-derived rather than microbial. Mitochondria and chloroplasts amplify with 16S primers but are not bacteria. sp004296775 is a GreenGenes2 identifier for a contaminant present in this dataset. Use a comma-separated list with no spaces.
--p-exclude, Comma-separated list of strings to match against taxonomy annotations (case-insensitive substring match). Features whose taxonomy contains any of these strings are removed from the table.

Also filter the full (ungrouped) table, this version will be used in downstream R analyses:

qiime taxa filter-table \
--i-table table.qza \
--i-taxonomy taxonomy_gg2.qza \
--p-exclude \
--o-filtered-table table_nomitochloro.qza
Apply the same exclusion criteria you identified above, mitochondria, chloroplasts, and the dataset-specific contaminant. Comma-separated, no spaces.
--p-exclude, Comma-separated list of taxonomy substring matches. Same logic as the grouped table filter above.

Taxonomy Barplots Without Contaminants

qiime taxa barplot \
  --i-table table_type_days_nomitochloro.qza \
  --i-taxonomy taxonomy_gg2.qza \
  --o-visualization taxa_barplot_type_days_nomitochloro.qzv

Outputs

File Type Description
taxonomy_gg2.qza Artifact Taxonomic assignments for all ASVs
taxonomy_gg2.qzv Visualization Tabulated taxonomy with confidence scores
table_type_days.qza Artifact Feature table grouped by sample type + day
taxa_barplot_type_days.qzv Visualization Taxonomy barplot (all reads)
table_type_days_nomitochloro.qza Artifact Grouped table, contaminants removed
table_nomitochloro.qza Artifact Full table, contaminants removed
taxa_barplot_type_days_nomitochloro.qzv Visualization Taxonomy barplot (contaminants removed)

This completes Day 1. Continue to Day 2, Community & Advanced Analyses.