In this study, we looked at intestinal microbial communities from five herbivorous fish species across two families—Sparisoma viride, Sparisoma aurofrenatum, and Scarus taeniopterus from the Labridae and Acanthurus tractus and Acanthurus coeruleus from the Acanthuridae.

Definitions & Abbreviations

  • Amplicon Sequence Variant (ASV): Exact sequence variant—analogous to an OTU—but with single nucleotide resolution.
  • Differentially abundant (DA) feature: Taxa, ASV, etc. that is disproportionately abundant in a group of samples and statistically different than other groups.

Goals of the Study

  1. Assess the taxonomic composition of intestinal communities from herbivorous reef fish.
  2. Determine the diversity of these communities and their similarity/dissimilarity.
  3. Identify differentially abundant ASVs across the host species.
  4. Predict the specificity of differentially abundant ASVs.

Workflow Overview

No 1. Field Observations

In the first section we run some analyses on field-based behavioral assays of the different herbivorous reef fish species.

No 2. DADA2 Workflow

In this part we go through the process of processing raw 16S rRNA read data including assessing read quality, filtering reads, correcting errors, and infersing amplicon sequence variants (ASVs).

No 3. Data Preparation

Next we go through the steps of defining sample groups, creating phyloseq objects, removing unwanted samples, and removing contaminant ASVs. Various parts of this section can easily be modified to perform different analyses. For example, if you were only interested in a specific taxa or group of samples, you could change the code here to create new phyloseq objects.

No 4. Composition & Diversity

Here we assess taxonomic composition, alpha diversity, and beta diversity. Phyloseq offers many options for assessing diversity, including several alpha diversity metrics, additional ordination and distance methods, and so on. You can play around with these settings to how it affects the results.

No 5. Differentially Abundant ASVs

We wanted to understand how ASVs partitioned across host species. We also wanted to assess the specificity of each ASV to determine habitat preference. To our knowledge there is no quantitative way to do this. The only attempt we are aware of was MetaMetaDB but it is based on a 454 database and no longer seems to be in active development. So we used an approach based on the work of Sullam et. al., first identifying differentially abundant ASVs, then searching for closest database hits, and finally using phylogenetic analysis and top hit metadata (isolation source, natural host) to infer habitat preference.

No 6. Synthesis

In this section we pull together the results and try to make sense of the microbiomes from these herbivorous reef fish. How are ASVs partitioning across host? How similar are these ASVs to sequences from other studies? What can these patterns tell us about host specificity?


This section contains information on a) other analyses & visualizations, b) tools & resources used in this workflow, c) submitting sequencing data to public archives, and d) specific R package & versions used in this workflow.

All tables and figures presented herein are named as they appeared in the original publication. We also include many additional data productes that were not part of the original publication.

Color & Graphics

Throughout this workflow we are going to rely on color to help us tell a story. We will use color to delineate host fish species and to delineate microbial taxa. Microbial diversity is pretty vast and it can be difficult to display all of this diversity in a single, static figure.

Many of us perceive color and/or differences in color, well, differently. So when designing figures it is important to use a) a relatively few colors and b) a palette that is friendly to a variety of people. For our figures, we generated a palette based on Bang Wong’s scheme described in this paper. Wong’s scheme uses contrasting colors that can be distinguished by a range of people. Consider that roughly 8% of people (mostly males) are color blind. So what do you think? Do you want Keanu Reeves to understand your figures or not?

Wong’s scheme is conservative—there are only 7 colors. We added black grey, and a blueish white to give us some wiggle room (we cheated a little). Others have developed 12 and 15 color palette schemes and these are worth looking into, but be careful—figures with too many colors can inhibit our ability to discern patterns. This conservative palette forced us to choose carefully when deciding which taxa to target or how many groups to display. To keep things simple, we created two palettes—one for microbial taxa (friend_pal) with all the colors and another for the five host fish species (samp_pal). The fish palette is just a subset of the full palette. Here is the code:

#Full palette
friend_pal <- c("#009E73", "#D55E00", "#F0E442",
                "#CC79A7", "#56B4E9", "#E69F00",
                "#0072B2", "#7F7F7F", "#B6DBFF",

#Fish palette
samp_pal <- c("#CC79A7", "#0072B2", "#009E73",
              "#56B4E9", "#E69F00")

cols <- function(a) image(1:10, 1, as.matrix(1:10),
                          col=a, axes=FALSE , xlab="", ylab="")

There is a great article on Coloring for Colorblindness by David Nichols that has an interactive color picker and recommendations for accessible palettes. This is also a really cool site for looking at color combinations. Both resources are highly recommended.