Commit 860c5029 authored by Overduin, Sam's avatar Overduin, Sam Committed by Overduin, Sam
Browse files

Documentation in

Signed-off-by: Overduin, Sam's avatarOverduin, Sam <>
parent 50bbcb3f
<font size=20>__taxaSPAdes Manual__</font>
# Introduction
TaxaSPAdes is a pipeline that builds on top of cloudSPAdes and metaSPAdes to utilise taxonomy of edges in the assembly graph during the scaffolding phase of the assembly. Currently, taxaSPAdes requires linked reads and a metagenomic dataset to work.
Taxonomy of edges is assigned by annotating taxonomy to the reads and utilising the taxonomy of these reads mapped to an edge with a LCA algorithm.
For more information please contact the author, Sam Overduin.
# Overview
The pipeline consists of 3 steps that should be run manually: read taxonomy assignment with Kraken2, read annotation with scripts/ and taxaSPAdes.
## Kraken2
1. Read taxonomy assignment with Kraken2.
This step requires Kraken2 to be installed already with a database. If this is not the case, download Kraken2 from [here](, install and set up the database according to the instructions [here](
If Kraken2 is installed and added to PATH, run the following:
kraken2 --db $kraken_db_location --threads 8 --confidence 0.2 --output $kraken_out_file
Using ```--confidence 0.2``` is important for the quality of taxonomic assignment in later steps.
It is possible to use a different tool than Kraken2 to taxonomically annotate the reads. In this case be sure to check what columns of the output file contain the read header and taxonomy ID.
## Read annotation
2. Read annotation with scripts/
This step adds taxonomic annotation to the reads in the form of TaxaTree:
Make sure you have a working python3 installation in your PATH.
A requirement for this script is ete3. If it is not installed install it with:
pip3 install ete3
The first time the annotation script is run, it will download the NCBI taxonomy database to your $HOME. Keep in mind this will increase the running time the first time it is run.
Run the annotation script with:
chmod +x scripts/
scripts/ -i $input_reads.fastq.gz -k $kraken_out_file -o $reads_with_taxatree.fastq.gz
Optional options are:
```--main_ranks or -m```
Flag to keep only major ranks (domain, kingdom, phylum, class, order, family, genus, species) when annotating the taxonomy.
```--taxa_pos or -tp```
Defines 0-based index of column containing taxID in kraken_file. Useful if using a different tool than Kraken2. Default=2.
```--read_pos or -rp```
Defines 0-based index of column containing read header in kraken_file. Useful if using a different tool than Kraken2. Default=1.
## TaxaSPAdes
3. Run taxaSPAdes with the annotated reads
Run taxaSPAdes with the following parameters:
```bash --meta --gemcode-12 1 $reads_with_taxatree.fastq.gz --taxonomy
Be sure that a linked read (barcoded) library such as 10X is used and the assembly is a metagenome assembly. Otherwise, taxaSPAdes will not work. Any additional parameters from SPAdes that do not conflict are allowed.
To install taxaSPAdes, download the source code and build with:
``` bash
tar -xzf SPAdes-3.13.0.tar.gz
cd SPAdes-3.13.0
Make sure you have the following dependencies installed:
- g++ (version 5.3.1 or higher)
- cmake (version 2.8.12 or higher)
- zlib
- libbz2
and build it with the following script:
``` bash
SPAdes will be built in the directory `./bin`. If you wish to install SPAdes into another directory, you can specify full path of destination folder by running the following command in `bash` or `sh`:
``` bash
PREFIX=<destination_dir> ./
For the rest of the SPAdes Manual see below.
<font size=20>__SPAdes 3.13.0 Manual__</font>
......@@ -117,6 +117,11 @@ install(PROGRAMS "${CMAKE_CURRENT_SOURCE_DIR}/../"
COMPONENT runtime)
# taxaSPAdes script installation
install(PROGRAMS "${CMAKE_CURRENT_SOURCE_DIR}/../scripts/"
COMPONENT runtime)
COMPONENT runtime)
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment