On this page you will find information on obtaining various data, data products, and processing scripts. Of course, all the R code is embeded in the workflows on the website as well.

To start from the beginning you will need the DADA2 workflow and the raw data.

The data in the ENA (primers removed) is the input for the DADA2 workflow but if you prefer to remove primers yourself, then please download the raw data instead.

To run the phyloseq workflow only, download this file:

10.6084/m9.figshare.7357178: DOI for the phyloseq workflow. Includes output from the DADA2 workflow, the phyloseq script, and other necessary input files.

The input file for this workflow is the output from the DADA2 workflow (combo_pipeline.rdata). There are also some additional files included that are needed for some of the analyses.

Additional data products.

Accessing the R Code only.

The R code is available by clicking on the code button in the menu bar or here. Please note that this R code is pulled from all the .Rmd files. This has not been tested independent of the R Markdown workflows so Use at Your Own Risk. In other words, the code works in the Rmarkdown format but the complete pipeline has not been tested using just this code. I used knitr::purl() to pull the code from the Rmarkdown file. I did this for you just in case you wanted the code and not hear me drone on about colors, zoomable figures, or Keanu Reeves. The first part is the DADA2 workflow and the second part is the phyloseq workflow. Commands that are commented out are things I tried that I could never get to work. Any line that starts like this: ## ---- is the code chuck name and details.

Submitting sequence data to nucleotide archives

We submitted out data to the European Nucleotide Archive (ENA). The ENA does not like RAW data and prefers to have primers removed. So we submitted the trimmed Fastq files to the ENA. You can find these data under the study accession number PRJEB28397. The RAW files on our figshare site (see above).

To submit to the ENA you need two data tables (plus your sequence data). One file describes the samples and the other file describes the sequencing data.

You can download these data tables here:

Description of sample data

Description of sequence data

Step by step instructions for submitting to the ENA

  1. go to https://www.ebi.ac.uk/ena/submit and select Submit to ENA.
  2. Login or Register.
  3. Go to New Submission tab and select Register study (project).
  4. Hit Next
  5. Enter details and hit Submit.
  6. Next, Select Checklist. This will be specific to the type of samples you have and basically will create a template so you can add your sample metadata. For this study I chose GSC MIxS host associated
  7. Next
  8. Now go through and select/deselect fields as needed. Note, some fields are mandatory.
  9. Once finished, hit Next to fill in any details that applay to All samples.
  10. Fill in the sheet
  11. Hit the Next button, change the number of samples, and download the sheet. (This is a little messy and you just need to wade through it)
  12. Once everything looks good and uploaded, click Next to get to the Run page.
  13. Select Two Fastq files (Paired) and Download the template.
  14. Before filling out the form, gzip .gz all the trimmed fastq files (these are what you submit)
  15. Then run md5 on all the tar.gz files.
  16. Upload all the fastq files. There are different options for this step.
  17. Fill in the sheet including md5 checksum values.
  18. Upload and submit the sheet.

Edit this page