Skip to content

Getting Data Into R

The last block of Day 2 is a short demo of the bridge between QIIME2 and R: how to export the artifacts you've just produced into flat TSV / CSV files that R can ingest directly. The full export reference (every metric and every command you'll likely want for publication-quality figures) lives on the Publication Plots in R (Self-Study) page, work through that on your own afterward.

Why Export?

QIIME2 artifacts (.qza, .qzv) are zip archives with provenance and metadata wrapped around the underlying data. R can't read them directly, so we use qiime tools export to unwrap one and emit the inner table as a TSV.

Demo, Export an α-Diversity Vector

Pick one alpha metric you computed in Phase 4 of the core metrics step. Export Shannon as an example:

qiime tools export \
  --input-path core_metrics/shannon_vector.qza \
  --output-path output_for_R

mv output_for_R/alpha-diversity.tsv output_for_R/shannon_vector.tsv

The rename matters: every alpha export writes to alpha-diversity.tsv by default, so you have to rename right after export or each subsequent metric will overwrite the last.

Demo, Export a β-Diversity Distance Matrix

qiime tools export \
  --input-path core_metrics/bray_curtis_distance_matrix.qza \
  --output-path output_for_R

mv output_for_R/distance-matrix.tsv output_for_R/bray_curtis_distance_matrix.tsv

Same rename pattern.

Pulling Into R

Once you have the TSVs, load them in R as a normal tab-delimited file. For an alpha vector:

shannon <- read.table(
  "output_for_R/shannon_vector.tsv",
  header = TRUE, sep = "\t", check.names = FALSE
)

For a distance matrix:

bc <- as.matrix(read.table(
  "output_for_R/bray_curtis_distance_matrix.tsv",
  header = TRUE, row.names = 1, sep = "\t", check.names = FALSE
))

Join them to your metadata on the sample-ID column and you're set up for ggplot2, vegan, phyloseq, or whatever downstream package you prefer.

Going Further

The Publication Plots in R (Self-Study) page contains export commands for:

  • All four α-diversity vectors (Shannon, Faith's PD, observed features, evenness)
  • All four β-diversity distance matrices (Bray-Curtis, Jaccard, weighted & unweighted UniFrac)
  • The taxonomy + feature table joined into a single tabulated visualization
  • ANCOM-BC2 differential abundance results
  • Random Forest feature importances

It also walks you through copying the workshop's visualizations.Rmd and R.Rproj into your working directory, knitting the report, and producing the figures from the lecture slides.


This completes Day 2. Continue to Day 3, Cow Dataset Pipeline.