On this page you will find information on obtaining various data, data products, and processing scripts. Of course, all the R code is embeded in the workflows on the website as well.
To start from the beginning you will need the DADA2 workflow and the raw data.
The data in the ENA (primers removed) is the input for the DADA2 workflow but if you prefer to remove primers yourself, then please download the raw data instead.
To run the phyloseq workflow only, download this file:
10.6084/m9.figshare.7357178: DOI for the phyloseq workflow. Includes output from the DADA2 workflow, the phyloseq script, and other necessary input files.
The input file for this workflow is the output from the DADA2 workflow (combo_pipeline.rdata
). There are also some additional files included that are needed for some of the analyses.
Additional data products.
Accessing the R Code only.
The R code is available by clicking on the code button in the menu bar or here. Please note that this R code is pulled from all the .Rmd
files. This has not been tested independent of the R Markdown workflows so Use at Your Own Risk. In other words, the code works in the Rmarkdown format but the complete pipeline has not been tested using just this code. I used knitr::purl()
to pull the code from the Rmarkdown file. I did this for you just in case you wanted the code and not hear me drone on about colors, zoomable figures, or Keanu Reeves. The first part is the DADA2 workflow and the second part is the phyloseq workflow. Commands that are commented out are things I tried that I could never get to work. Any line that starts like this: ## ----
is the code chuck name and details.
Submitting sequence data to nucleotide archives
We submitted out data to the European Nucleotide Archive (ENA). The ENA does not like RAW data and prefers to have primers removed. So we submitted the trimmed Fastq files to the ENA. You can find these data under the study accession number PRJEB28397. The RAW files on our figshare site (see above).
To submit to the ENA you need two data tables (plus your sequence data). One file describes the samples and the other file describes the sequencing data.
You can download these data tables here:
Step by step instructions for submitting to the ENA
md5
on all the tar.gz
files.