Skip to content

Machine Learning

QIIME2's sample-classifier plugin provides supervised machine learning tools for classification and regression using microbiome feature tables. Here we ask two questions:

  1. Can we predict the facility where a sample was collected? (classification)
  2. Can we predict the accumulated degree days (ADD) at time of collection? (regression)

Classification

Predict Facility

Train a Random Forest classifier to predict facility from microbial community composition:

qiime sample-classifier classify-samples \
  --i-table table_nomitochloro_1500_abund_L7.qza \
  --m-metadata-file metadata_q2_workshop.txt \
  --m-metadata-column facility \
  --p-random-state 123 \
  --p-n-jobs 1 \
  --output-dir sample_classifier_results_facility

Random State

Setting --p-random-state 123 makes the results reproducible. Any integer works, use the same value whenever you need to reproduce the exact same train/test split and model.

Visualize Top Features

Facility Classifier

Generate a heatmap showing normalized abundance of the 100 most important features:

qiime sample-classifier heatmap \
  --i-table table_nomitochloro_1500_abund_L7.qza \
  --i-importance sample_classifier_results_facility/feature_importance.qza \
  --m-sample-metadata-file metadata_q2_workshop.txt \
  --m-sample-metadata-column facility \
  --p-group-samples \
  --p-feature-count 100 \
  --o-heatmap sample_classifier_results_facility/heatmap_100_features.qzv \
  --o-filtered-table sample_classifier_results_facility/filtered_table_100_features.qza

Regression

Predict Accumulated Degree Days

Train a Random Forest regressor to predict the continuous variable add_0c (accumulated degree days):

qiime sample-classifier regress-samples \
  --i-table table_nomitochloro_1500_abund_L7.qza \
  --m-metadata-file metadata_q2_workshop.txt \
  --m-metadata-column add_0c \
  --p-estimator RandomForestRegressor \
  --p-n-estimators 20 \
  --p-random-state 123 \
  --output-dir sample_regressor_results_ADD

Inspect Feature Importances

ADD Regressor

qiime metadata tabulate \
  --m-input-file sample_regressor_results_ADD/feature_importance.qza \
  --o-visualization sample_regressor_results_ADD/feature_importance.qzv

Outputs

File Type Description
sample_classifier_results_facility/ Directory Classifier results (accuracy, confusion matrix, feature importance)
sample_classifier_results_facility/heatmap_100_features.qzv Visualization Heatmap of top 100 features
sample_classifier_results_facility/filtered_table_100_features.qza Artifact Feature table subset to top 100 features
sample_regressor_results_ADD/ Directory Regressor results (R², scatter plot, feature importance)
sample_regressor_results_ADD/feature_importance.qzv Visualization Tabulated feature importances for ADD

Next: Network Primer