Glossary¶

Term	Definition
ADD Accumulated Degree Days	A cumulative measure of temperature exposure over time. Used in this workshop as a continuous proxy for time since sample collection.
Alpha Diversity	Measurement of microbial diversity within a single sample. Includes richness, evenness, and sometimes phylogenetic relationships depending on the metric.
Alpha Correlation	Analysis testing relationships between alpha diversity metrics and continuous metadata variables.
ANCOM-BC Analysis of Compositions of Microbiomes with Bias Correction	A statistical method for identifying differentially abundant taxa between groups that accounts for the compositional nature of microbiome data.
Alpine Cluster	A high-performance computing (HPC) system used for running bioinformatics analyses.
Alpine OnDemand	A web portal interface used to access Alpine computing resources through a browser.
Amplicon	A DNA region amplified by PCR for sequencing, such as the V4 region of the 16S rRNA gene.
Barcode Index Sequence	A short DNA sequence added to a sample that uniquely identifies its reads during demultiplexing.
Barcoded PCR	A PCR strategy in which primers contain sample-specific barcode sequences.
Beta Diversity	Diversity between samples, describing how different microbial communities are from one another.
Categorical Variable	Metadata variable consisting of discrete groups (e.g., treatment or location).
Classifier	A trained machine-learning model that predicts taxonomy of unknown sequences using a reference database.
Chimera	A PCR artifact formed when fragments from multiple DNA templates combine into a single sequence.
Chimera Filtering	The process of detecting and removing chimeric sequences from a dataset.
Chloroplast Sequences	Plant-derived sequences that appear in 16S datasets because chloroplasts evolved from cyanobacteria.
Compressed Sequencing Files .fastq.gz	Raw sequencing data files compressed using gzip to reduce storage requirements.
Compute Node	A non-interactive computing resource where submitted jobs are executed.
Continuous Variable	A numeric metadata variable with a range of values (e.g., temperature or time).
DADA2 Divisive Amplicon Denoising Algorithm 2	A bioinformatics method that models sequencing errors to infer true biological sequences (ASVs) from amplicon data.
Demultiplexing Demux	The process of separating pooled sequencing reads into individual samples using barcode sequences.
Denoising	The process of correcting sequencing errors, removing artifacts, and generating ASVs from raw sequencing reads.
Diversity Metric	Any quantitative measure describing microbial diversity within or between samples.
Faith's PD Faith's Phylogenetic Diversity	The total branch length of a phylogenetic tree spanning taxa in a sample; incorporates evolutionary relationships.
FASTQ File	A sequencing file format storing DNA sequences and their quality scores.
Feature Table	A matrix of samples × features (ASVs or taxa) containing counts or frequencies.
GreenGenes2 GG2	A curated 16S rRNA reference database used for taxonomic classification and phylogenetic placement.
Importing	The process of converting raw sequencing data into QIIME 2 artifacts (.qza files).
Interactive Node	A computing session where commands are run interactively (e.g., via sinteractive).
Job Script	A shell script submitted to a cluster containing resource requests and commands for execution.
Kruskal–Wallis Test	A nonparametric statistical test used to determine whether differences exist between two or more groups.
LME Linear Mixed Effects Model	A statistical model that accounts for both fixed and random effects in longitudinal or repeated-measures data.
Manifest File	A file mapping sample IDs to FASTQ file paths for QIIME 2 import.
Mitochondrial Sequences	Eukaryotic-derived sequences appearing in 16S datasets due to bacterial evolutionary origin of mitochondria.
Multiplexing	The pooling of multiple samples in one sequencing run to reduce cost.
Naive Bayes Classifier	A machine learning algorithm used for taxonomic classification of sequences based on a reference database.
OTU Operational Taxonomic Unit	A cluster of sequences grouped by similarity (typically 97% identity); largely replaced by ASVs.
Paired-End Reads	Sequencing reads generated from both ends of a DNA fragment.
Partition	A queue of compute resources with specific hardware and usage policies in SLURM.
PCoA Principal Coordinates Analysis	An ordination method that visualizes similarities/differences between samples based on distance matrices.
PERMANOVA Permutational Multivariate Analysis of Variance	A statistical test for differences in community composition using distance matrices.
Phred Score Q Score	A measure of sequencing accuracy (e.g., Q30 = 99.9% accuracy).
Pielou's Evenness	A measure of how evenly taxa are distributed within a sample.
Phylogenetic Diversity	Diversity metric that incorporates evolutionary relationships between taxa.
QIIME 2	An open-source microbiome bioinformatics platform for analyzing amplicon sequencing data.
Quality Filtering	Removal of low-quality reads or bases based on quality scores.
QZA / QZV	QIIME 2 artifact (.qza) and visualization (.qzv) file formats.
Rarefaction	Subsampling reads to equal sequencing depth across samples.
Rarefaction Curve	A plot showing how observed diversity changes with sequencing depth.
Reference Database	A curated collection of sequences with known taxonomy used for classification (e.g., GreenGenes2).
Relative Abundance	The proportion of reads assigned to a taxon relative to total reads in a sample.
Read Merging	Combining overlapping forward and reverse reads into a single sequence.
Representative Sequences	The sequence representing each ASV after denoising.
Richness	The number of unique taxa present in a sample.
sbatch	The SLURM command used to submit jobs to a cluster queue.
SEPP SATe-enabled Phylogenetic Placement	A method for placing sequences into a reference phylogenetic tree.
Shannon Entropy	A diversity metric that accounts for both richness and evenness.
Shebang #!/bin/bash	The first line of a shell script defining the interpreter.
Single-End Reads	Sequencing reads generated from one end of a DNA fragment.
SLURM Simple Linux Utility for Resource Management	A workload manager used to schedule jobs on HPC systems.
Saturation of the Rarefaction Curve	Point where additional sequencing depth no longer increases observed diversity.
Taxa Bar Plot	A stacked bar plot showing relative abundance of microbial taxa across samples.
Taxa Filtering	Removal of unwanted taxa (e.g., plant or host DNA) from datasets.
Taxonomic Classification	Assignment of taxonomy to sequences using a reference database.
Taxonomy	The hierarchical classification system for organisms (kingdom → species).
Trim Left --p-trim-left	Removes bases from the start of sequencing reads.
Truncation --p-trunc-len	Trims reads to a fixed length from the end.
UniFrac	A phylogenetic beta diversity metric based on shared branch lengths between samples.