Getting Data Into R¶
The last block of Day 2 is a short demo of the bridge between QIIME2 and R: how to export the artifacts you've just produced into flat TSV / CSV files that R can ingest directly. The full export reference (every metric and every command you'll likely want for publication-quality figures) lives on the Publication Plots in R (Self-Study) page, work through that on your own afterward.
Why Export?¶
QIIME2 artifacts (.qza, .qzv) are zip archives with provenance and metadata wrapped around the underlying data. R can't read them directly, so we use qiime tools export to unwrap one and emit the inner table as a TSV.
Demo, Export an α-Diversity Vector¶
Pick one alpha metric you computed in Phase 4 of the core metrics step. Export Shannon as an example:
qiime tools export \
--input-path core_metrics/shannon_vector.qza \
--output-path output_for_R
mv output_for_R/alpha-diversity.tsv output_for_R/shannon_vector.tsv
The rename matters: every alpha export writes to alpha-diversity.tsv by default, so you have to rename right after export or each subsequent metric will overwrite the last.
Demo, Export a β-Diversity Distance Matrix¶
qiime tools export \
--input-path core_metrics/bray_curtis_distance_matrix.qza \
--output-path output_for_R
mv output_for_R/distance-matrix.tsv output_for_R/bray_curtis_distance_matrix.tsv
Same rename pattern.
Pulling Into R¶
Once you have the TSVs, load them in R as a normal tab-delimited file. For an alpha vector:
shannon <- read.table(
"output_for_R/shannon_vector.tsv",
header = TRUE, sep = "\t", check.names = FALSE
)
For a distance matrix:
bc <- as.matrix(read.table(
"output_for_R/bray_curtis_distance_matrix.tsv",
header = TRUE, row.names = 1, sep = "\t", check.names = FALSE
))
Join them to your metadata on the sample-ID column and you're set up for ggplot2, vegan, phyloseq, or whatever downstream package you prefer.
Going Further¶
The Publication Plots in R (Self-Study) page contains export commands for:
- All four α-diversity vectors (Shannon, Faith's PD, observed features, evenness)
- All four β-diversity distance matrices (Bray-Curtis, Jaccard, weighted & unweighted UniFrac)
- The taxonomy + feature table joined into a single tabulated visualization
- ANCOM-BC2 differential abundance results
- Random Forest feature importances
It also walks you through copying the workshop's visualizations.Rmd and R.Rproj into your working directory, knitting the report, and producing the figures from the lecture slides.
This completes Day 2. Continue to Day 3, Cow Dataset Pipeline.