Skip to content
Snippets Groups Projects
Commit 639b008b authored by Carlos de Lannoy's avatar Carlos de Lannoy
Browse files

add readme

parent f03ea634
No related branches found
No related tags found
No related merge requests found
README.md 0 → 100644
# MinION & Miseq 16S classification pipelines
Carlos de Lannoy
Scripts to classify MinION and Miseq 16S reads using several often-used classification tools.
__Raw, unoptimized and thoroughly untested, these come with no warranties and serve mainly for my own documentation at the moment.__
## contents
## Requirements
- git
- python=>3.6
- miniconda3
- [kraken2](https://ccb.jhu.edu/software/kraken2/index.shtml?t=downloads), make sure the following are in your path:
- sub kraken2-build
- [centrifuge](https://ccb.jhu.edu/software/centrifuge/manual.shtml), make sure the following are in your path:
- sub centrifuge-build
- sub centrifuge-kreport
```bash
git clone https://git.wur.nl/lanno001/metaminion.git && cd metaminion
conda env create -f minionmeta_conda.yml && source activate minionmeta
```
## Database building
Classification done using NCBI 16S data base, found [here (retrieved 30/03/2019)](ftp://ftp.ncbi.nlm.nih.gov/blast/db/16SMicrobial.tar.gz).
Most tools required some form of conversion.
For centrifuge and qiime, convert fasta of assemblies of organisms - only NCBI accession ID as header! - to respective
DB formats using:
```bash
# For centrifuge
python ncbi2centrifuge.py --in-fasta assemblies.fasta --entrez-email your_mail --out-dir output_directory
# For qiime2
python ncbi2qiime.py --in-fasta assemblies.fasta --entrez-email your_mail --out-dir output_directory
```
kraken2 has its own database conversion utility. First truncate reads e.g. with qiime2. I used reads after truncation
output by qiime_make_classifier.sf (see below). Then:
```bash
kraken2-build --download-taxonomy --db location/of/db
```
## Classification
### Qiime2
First construct qiime2 classifier:
```bash
snakemake -s qiime_make_classifier.sf --config \
fasta=reads_dir \
tax_tsv=tax_table.tsv \ # generated by ncbi2qiime.py
out_dir=direcotory_to_write_to \
fwd_primer=forward_primer_sequence \
rev_primer=reverse_primer_sequence \
trunc_len=length_of_region_after_truncation
```
For MinION I used the following settings:
```yaml
fwd_primer: AGAGTTTGATCCTGGCTCAG
rev_primer: GGTTACCTTGTTACGACTT
trunc_len: 1465
```
For Miseq:
```yaml
out_dir: classifier_wd
fwd_primer: CCTACGGGNGGCWGCAG
rev_primer: GACTACHVGGGTATCTAATCC
trunc_len: 444
```
To classify MinION reads:
```bash
snakemake -s qiime_analysis_minion.sf --config \
fastq_dir=MinION_reads/dir/ \
classifier=classifier.qza \
out_dir=output_dir \
metadata=meta_data.tsv
```
To classify MiSeq reads:
```bash
snakemake -s qiime_analysis_illumina.sf --config \
fastq_dir=reads_dir \
classifier=classifier.qza \
out_dir=output_dir \
metadata=meta_data.tsv \
trim_fwd=6 \
trim_rev=8 \
trunc_fwd=240 \
trunc_rev=210
```
Metadata file [included](resources/meta_data_reads.tsv). If you make your own, it should look like this:
```tsv
sample_name sequencing_technique barcode sample_number type
#q2:types categorical categorical categorical categorical
I18-1139-65_S124_L001_001 Miseq None 65 Moneymaker
I18-1139-66_S125_L001_001 Miseq None 66 Moneymaker
I18-1139-71_S327_L001_001 Miseq None 71 Pimpinellifolium
...
```
visualization artifacts can be uploaded to https://view.qiime2.org to view.
### Centrifuge
Classify using centrifuge and produce kraken2-style report (readable in Pavian) and qiime2 artifacts.
For MinION:
```bash
snakemake -s centrifuge_analysis_minion.sf --config \
fastq_dir=fastq_reads_dir \
centrifuge_db=centrifuge_db \
out_dir=output_dir \
metadata=metadata_file \
script_location=absolute_path_of_this_repo
```
For Miseq:
```bash
snakemake -s centrifuge_analysis_illumina.sf --config \
fastq_dir=fastq_reads_dir \
centrifuge_db=centrifuge_db \
out_dir=output_dir \
metadata=metadata_file \
script_location=absolute_path_of_this_repo
```
### Kraken2
No workflows for kraken2, as it's straight-forward enough.
For MinION and Miseq:
```bash
kraken2 \
--db db/ncbi_16s \
--threads 6 \
--output kraken2_output \
--report kraken2_report \
--minimum-base-quality 7 \
reads_fastq
```
Kraken2 reports can be visualized using Pavian.
### Blast
Blast MinION reads against a local db. Optionally, if sequencing mock community, provide list of species
('Genus species', 1 species per line) to draw up bar chart.
```bash
python get_best_blast_hit.py \
--db blast_db \
--in-dir input_dir/multi_fasta_reads \ # may contain multiple multifastas, one per sample
--out-dir output_dir \
--cutoff min_number_reads_for_positive_id \
--mock-composition optional_list_mock_species.txt
```
......@@ -9,12 +9,14 @@ parser = argparse.ArgumentParser(description='Translate centrifuge classificatio
parser.add_argument('--in-class', type=str, required=True, nargs='+',
help='centrifuge classification')
parser.add_argument('--out-class', type=str, required=True, help='qiime2 classification')
parser.add_argument('--entrez-email', type=str, required=True,
help='Your email address, used by ncbi to warn you if you are hogging resources')
parser.add_argument('--no-header', action='store_true',
help='Return table without header')
args = parser.parse_args()
ncbi = NCBITaxa()
Entrez.email = "carlos.delannoy@wur.nl"
Entrez.email = args.entrez_email
saved_ranks = {'superkingdom': 'su__', 'kingdom': 'k__', 'phylum': 'p__', 'class': 'c__',
'order': 'o__', 'family': 'f__', 'genus': 'g__', 'species': 's__'}
......
name: minionmeta
channels:
- qiime2/label/r2019.7
- bioconda
- conda-forge
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- _r-mutex=1.0.1=anacondar_1
- absl-py=0.7.1=py36_0
- arb-bio-tools=6.0.6=haa8b8d8_8
- asn1crypto=0.24.0=py36_1003
- astor=0.7.1=py_0
- atomicwrites=1.3.0=py_0
- attrs=19.1.0=py_0
- backcall=0.1.0=py_0
- bibtexparser=1.1.0=py_0
- binutils_impl_linux-64=2.31.1=h6176602_1
- binutils_linux-64=2.31.1=h6176602_8
- bioconductor-biobase=2.42.0=r351h14c3975_1
- bioconductor-biocgenerics=0.28.0=r351_1
- bioconductor-biocparallel=1.16.6=r351h1c2f66e_0
- bioconductor-biostrings=2.50.2=r351h14c3975_0
- bioconductor-dada2=1.10.0=r351hf484d3e_0
- bioconductor-delayedarray=0.8.0=r351h14c3975_0
- bioconductor-genomeinfodb=1.18.1=r351_0
- bioconductor-genomeinfodbdata=1.2.1=r351_0
- bioconductor-genomicalignments=1.18.1=r351h14c3975_0
- bioconductor-genomicranges=1.34.0=r351h14c3975_0
- bioconductor-iranges=2.16.0=r351h14c3975_0
- bioconductor-rsamtools=1.34.0=r351hf484d3e_0
- bioconductor-s4vectors=0.20.1=r351h14c3975_0
- bioconductor-shortread=1.40.0=r351hf484d3e_0
- bioconductor-summarizedexperiment=1.12.0=r351_0
- bioconductor-xvector=0.22.0=r351h14c3975_0
- bioconductor-zlibbioc=1.28.0=r351h14c3975_0
- biom-format=2.1.7=py36h3010b51_1002
- biopython=1.74=py36h516909a_0
- blas=2.11=openblas
- blast=2.9.0=pl526h979a64d_3
- bleach=3.1.0=py_0
- bokeh=1.3.1=py36_0
- boost=1.68.0=py36h8619c78_1001
- boost-cpp=1.68.0=h11c811c_1000
- bwidget=1.9.11=0
- bz2file=0.98=py_0
- bzip2=1.0.8=h516909a_0
- c-ares=1.15.0=h516909a_1001
- ca-certificates=2019.6.16=hecc5488_0
- cachecontrol=0.12.5=py_0
- cairo=1.16.0=h18b612c_1001
- certifi=2019.6.16=py36_1
- cffi=1.12.3=py36h8022711_0
- chardet=3.0.4=py36_1003
- click=7.0=py_0
- cryptography=2.7=py36h72c5cf5_0
- curl=7.65.3=hf8cf82a_0
- cutadapt=2.4=py36h14c3975_0
- cycler=0.10.0=py_1
- dbus=1.13.6=he372182_0
- deblur=1.1.0=py36_0
- decorator=4.4.0=py_0
- defusedxml=0.5.0=py_1
- dnaio=0.3=py36h14c3975_1
- docutils=0.14=py36_0
- dropbox=5.2.1=py36_0
- ecdsa=0.13=py36_1
- emperor=1.0.0b19=py36_0
- entrez-direct=11.0=pl526_0
- entrypoints=0.3=py36_1000
- ete3=3.1.1=py_1
- expat=2.2.5=he1b5a44_1003
- fastcluster=1.1.25=py36h637b7d7_1000
- fasttree=2.1.10=0
- filechunkio=1.6=py36_0
- fontconfig=2.13.1=he4413a7_1000
- freetype=2.10.0=he983fc9_1
- ftputil=3.2=py36_0
- future=0.17.1=py36_1000
- gast=0.2.2=py_0
- gcc_impl_linux-64=7.3.0=habb00fd_1
- gcc_linux-64=7.3.0=h553295d_8
- gettext=0.19.8.1=hc5be6a0_1002
- gfortran_impl_linux-64=7.3.0=hdf63c60_1
- gfortran_linux-64=7.3.0=h553295d_8
- glib=2.58.3=h6f030ca_1002
- gmp=6.1.2=hf484d3e_1000
- gneiss=0.4.5=py_0
- gnutls=3.6.5=hd3a4fd2_1002
- graphite2=1.3.13=hf484d3e_1000
- grpcio=1.16.1=py36hf8bcb03_1
- gsl=2.5=h294904e_0
- gst-plugins-base=1.14.5=h0935bb2_0
- gstreamer=1.14.5=h36ae1b5_0
- gxx_impl_linux-64=7.3.0=hdf63c60_1
- gxx_linux-64=7.3.0=h553295d_8
- h5py=2.9.0=nompi_py36hcafd542_1103
- harfbuzz=2.4.0=h37c48d4_1
- hdf5=1.10.4=nompi_h3c11f04_1106
- hdmedians=0.13=py36h3010b51_1000
- icu=58.2=hf484d3e_1000
- idna=2.8=py36_1000
- ijson=2.3=py_1
- importlib_metadata=0.18=py36_0
- ipykernel=5.1.1=py36h5ca1d4c_0
- ipython=7.7.0=py36h5ca1d4c_0
- ipython_genutils=0.2.0=py_1
- ipywidgets=7.5.1=py_0
- iqtree=1.6.11=he860b03_0
- jedi=0.14.1=py36_0
- jinja2=2.10.1=py_0
- joblib=0.13.2=py_0
- jpeg=9c=h14c3975_1001
- jsonschema=3.0.1=py36_0
- jupyter_client=5.3.1=py_0
- jupyter_core=4.4.0=py_0
- keras-applications=1.0.7=py_1
- keras-preprocessing=1.0.9=py_1
- kiwisolver=1.1.0=py36hc9558a2_0
- kraken-biom=1.0.1=py_2
- krb5=1.16.3=h05b26f9_1001
- libarbdb=6.0.6=haa8b8d8_8
- libblas=3.8.0=11_openblas
- libcblas=3.8.0=11_openblas
- libcurl=7.65.3=hda55be3_0
- libedit=3.1.20170329=hf8c457e_1001
- libffi=3.2.1=he1b5a44_1006
- libgcc=7.2.0=h69d50b8_2
- libgcc-ng=9.1.0=hdf63c60_0
- libgfortran-ng=7.3.0=hdf63c60_0
- libiconv=1.15=h516909a_1005
- liblapack=3.8.0=11_openblas
- liblapacke=3.8.0=11_openblas
- libopenblas=0.3.6=h6e990d7_6
- libpng=1.6.37=hed695b0_0
- libprotobuf=3.9.1=h8b12597_0
- libsodium=1.0.17=h516909a_0
- libssh2=1.8.2=h22169c7_2
- libstdcxx-ng=9.1.0=hdf63c60_0
- libtiff=4.0.10=h57b8799_1003
- libuuid=2.32.1=h14c3975_1000
- libxcb=1.13=h14c3975_1002
- libxml2=2.9.9=h13577e0_2
- libxslt=1.1.32=hae48121_1003
- lockfile=0.12.2=py_1
- lxml=4.4.1=py36h7ec2d77_0
- lz4-c=1.8.3=he1b5a44_1001
- mafft=7.310=hf484d3e_2
- make=4.2.1=h14c3975_2004
- markdown=3.1.1=py_0
- markupsafe=1.1.1=py36h14c3975_0
- matplotlib=3.1.1=py36_0
- matplotlib-base=3.1.1=py36hfd891ef_0
- mistune=0.8.4=py36h14c3975_1000
- mock=3.0.5=py36_0
- more-itertools=7.2.0=py_0
- msgpack-python=0.6.1=py36h6bb024c_0
- natsort=6.0.0=py_0
- nbconvert=5.5.0=py_0
- nbformat=4.4.0=py_1
- ncurses=6.1=hf484d3e_1002
- nettle=3.4.1=h1bed415_1002
- networkx=2.3=py_0
- nose=1.3.7=py36_1002
- notebook=6.0.0=py36_0
- numpy=1.17.0=py36h95a1406_0
- olefile=0.46=py_0
- openjdk=11.0.1=h516909a_1016
- openssl=1.1.1c=h7b6447c_1
- packaging=19.0=py_0
- pandas=0.24.2=py36hb3f55d8_0
- pandoc=2.7.3=0
- pandocfilters=1.4.2=py_1
- pango=1.40.14=he7ab937_1005
- paramiko=1.18.4=py36h0d01e40_0
- parso=0.5.1=py_0
- patsy=0.5.1=py_0
- pcre=8.41=hf484d3e_1003
- perl=5.26.2=h516909a_1006
- perl-app-cpanminus=1.7044=pl526_1
- perl-archive-tar=2.32=pl526_0
- perl-base=2.23=pl526_1
- perl-business-isbn=3.004=pl526_0
- perl-business-isbn-data=20140910.003=pl526_0
- perl-carp=1.38=pl526_3
- perl-common-sense=3.74=pl526_2
- perl-compress-raw-bzip2=2.086=pl526hf484d3e_0
- perl-compress-raw-zlib=2.086=pl526h6bb024c_1
- perl-constant=1.33=pl526_1
- perl-data-dumper=2.173=pl526_0
- perl-digest-hmac=1.03=pl526_3
- perl-digest-md5=2.55=pl526_0
- perl-encode=2.88=pl526_1
- perl-encode-locale=1.05=pl526_6
- perl-exporter=5.72=pl526_1
- perl-exporter-tiny=1.002001=pl526_0
- perl-extutils-makemaker=7.36=pl526_1
- perl-file-listing=6.04=pl526_1
- perl-file-path=2.16=pl526_0
- perl-file-temp=0.2304=pl526_2
- perl-html-parser=3.72=pl526h6bb024c_5
- perl-html-tagset=3.20=pl526_3
- perl-html-tree=5.07=pl526_1
- perl-http-cookies=6.04=pl526_0
- perl-http-daemon=6.01=pl526_1
- perl-http-date=6.02=pl526_3
- perl-http-message=6.18=pl526_0
- perl-http-negotiate=6.01=pl526_3
- perl-io-compress=2.086=pl526hf484d3e_0
- perl-io-html=1.001=pl526_2
- perl-io-socket-ssl=2.066=pl526_0
- perl-io-zlib=1.10=pl526_2
- perl-json=4.02=pl526_0
- perl-json-xs=2.34=pl526h6bb024c_3
- perl-libwww-perl=6.39=pl526_0
- perl-list-moreutils=0.428=pl526_1
- perl-list-moreutils-xs=0.428=pl526_0
- perl-lwp-mediatypes=6.04=pl526_0
- perl-lwp-protocol-https=6.07=pl526_4
- perl-mime-base64=3.15=pl526_1
- perl-mozilla-ca=20180117=pl526_1
- perl-net-http=6.19=pl526_0
- perl-net-ssleay=1.88=pl526h90d6eec_0
- perl-ntlm=1.09=pl526_4
- perl-parent=0.236=pl526_1
- perl-pathtools=3.75=pl526h14c3975_1
- perl-scalar-list-utils=1.50=pl526h14c3975_0
- perl-socket=2.027=pl526_1
- perl-storable=3.15=pl526h14c3975_0
- perl-test-requiresinternet=0.05=pl526_0
- perl-time-local=1.28=pl526_1
- perl-try-tiny=0.30=pl526_1
- perl-types-serialiser=1.0=pl526_2
- perl-uri=1.76=pl526_0
- perl-www-robotrules=6.02=pl526_3
- perl-xml-namespacesupport=1.12=pl526_0
- perl-xml-parser=2.44_01=pl526ha1d75be_1002
- perl-xml-sax=1.02=pl526_0
- perl-xml-sax-base=1.09=pl526_0
- perl-xml-sax-expat=0.51=pl526_3
- perl-xml-simple=2.25=pl526_1
- perl-xsloader=0.24=pl526_0
- pexpect=4.7.0=py36_0
- pickleshare=0.7.5=py36_1000
- pigz=2.3.4=0
- pillow=6.0.0=py36he7afcd5_0
- pip=19.2.1=py36_0
- pixman=0.38.0=h516909a_1003
- pluggy=0.12.0=py_0
- prometheus_client=0.7.1=py_0
- prompt_toolkit=2.0.9=py_0
- protobuf=3.9.1=py36he1b5a44_0
- psutil=5.6.3=py36h516909a_0
- pthread-stubs=0.4=h14c3975_1001
- ptyprocess=0.6.0=py_1001
- py=1.8.0=py_0
- pycparser=2.19=py36_1
- pycrypto=2.6.1=py36h14c3975_9
- pygments=2.4.2=py_0
- pyopenssl=19.0.0=py36_0
- pyparsing=2.3.0=py_0
- pyqt=5.9.2=py36hcca6a23_2
- pyrsistent=0.15.4=py36h516909a_0
- pysftp=0.2.9=py36_0
- pysocks=1.7.0=py36_0
- pytest=5.0.1=py36_1
- python=3.6.7=h357f687_1005
- python-dateutil=2.8.0=py_0
- pytz=2019.2=py_0
- pyzmq=18.0.2=py36h1768529_2
- q2-alignment=2019.7.0=py36_0
- q2-composition=2019.7.0=py36_0
- q2-cutadapt=2019.7.0=py36_0
- q2-dada2=2019.7.0=py36_0
- q2-deblur=2019.7.0=py36_0
- q2-demux=2019.7.0=py36_0
- q2-diversity=2019.7.0=py36_0
- q2-emperor=2019.7.0=py36_0
- q2-feature-classifier=2019.7.0=py36_0
- q2-feature-table=2019.7.0=py36_0
- q2-fragment-insertion=2019.7.0=py36_0
- q2-gneiss=2019.7.0=py36_0
- q2-longitudinal=2019.7.0=py36_0
- q2-metadata=2019.7.0=py36_0
- q2-phylogeny=2019.7.0=py36_0
- q2-quality-control=2019.7.0=py36_0
- q2-quality-filter=2019.7.0=py36_0
- q2-sample-classifier=2019.7.1=py36_0
- q2-taxa=2019.7.0=py36_0
- q2-types=2019.7.0=py36_0
- q2-vsearch=2019.7.0=py36_0
- q2cli=2019.7.0=py36_0
- q2templates=2019.7.0=py36_0
- qiime2=2019.7.0=py36_0
- qt=5.9.7=h52cfd70_2
- r-assertthat=0.2.1=r35h6115d3f_1
- r-backports=1.1.4=r35hcdcec82_1
- r-base=3.5.1=h08e1455_1008
- r-bh=1.69.0_1=r35h6115d3f_1
- r-bitops=1.0_6=r35hcdcec82_1003
- r-cli=1.1.0=r35h6115d3f_1
- r-cluster=2.1.0=r35h9bbef5b_1
- r-colorspace=1.4_1=r35hcdcec82_1
- r-crayon=1.3.4=r35h6115d3f_1002
- r-data.table=1.12.2=r35hcdcec82_1
- r-digest=0.6.20=r35h0357c0b_1
- r-ellipsis=0.2.0.1=r35hcdcec82_1
- r-fansi=0.4.0=r35hcdcec82_1001
- r-formatr=1.7=r35h6115d3f_1
- r-futile.logger=1.4.3=r35h6115d3f_1002
- r-futile.options=1.0.1=r35h6115d3f_1001
- r-ggplot2=3.2.0=r35h6115d3f_1
- r-glue=1.3.1=r35hcdcec82_1
- r-gtable=0.3.0=r35h6115d3f_2
- r-hwriter=1.3.2=r35h6115d3f_1002
- r-labeling=0.3=r35h6115d3f_1002
- r-lambda.r=1.2.3=r35h6115d3f_1001
- r-lattice=0.20_38=r35hcdcec82_1002
- r-latticeextra=0.6_28=r35h6115d3f_1002
- r-lazyeval=0.2.2=r35hcdcec82_1
- r-magrittr=1.5=r35h6115d3f_1002
- r-mass=7.3_51.4=r35hcdcec82_1
- r-matrix=1.2_17=r35hcdcec82_1
- r-matrixstats=0.54.0=r35hcdcec82_1001
- r-mgcv=1.8_28=r35hcdcec82_1
- r-munsell=0.5.0=r35h6115d3f_1002
- r-nlme=3.1_140=r35h9bbef5b_1
- r-permute=0.9_5=r35_1
- r-pillar=1.4.2=r35h6115d3f_2
- r-pkgconfig=2.0.2=r35h6115d3f_1002
- r-plyr=1.8.4=r35h0357c0b_1003
- r-r6=2.4.0=r35h6115d3f_2
- r-rcolorbrewer=1.1_2=r35h6115d3f_1002
- r-rcpp=1.0.2=r35h0357c0b_0
- r-rcppparallel=4.4.3=r35h0357c0b_2
- r-rcurl=1.95_4.12=r35hcdcec82_1
- r-reshape2=1.4.3=r35h0357c0b_1004
- r-rlang=0.4.0=r35hcdcec82_1
- r-scales=1.0.0=r35h0357c0b_1002
- r-snow=0.4_3=r35h6115d3f_1001
- r-stringi=1.4.3=r35h0357c0b_2
- r-stringr=1.4.0=r35h6115d3f_1
- r-tibble=2.1.3=r35hcdcec82_1
- r-utf8=1.1.4=r35hcdcec82_1001
- r-vctrs=0.2.0=r35hcdcec82_1
- r-vegan=2.5_5=r35h9bbef5b_1
- r-viridislite=0.3.0=r35h6115d3f_1002
- r-withr=2.1.2=r35h6115d3f_1001
- r-zeallot=0.1.0=r35h6115d3f_1001
- raxml=8.2.12=h14c3975_1
- readline=8.0=hf8c457e_0
- requests=2.22.0=py36_1
- scikit-bio=0.5.5=py36h3010b51_1000
- scikit-learn=0.21.2=py36hcdab131_1
- seaborn=0.9.0=py_1
- send2trash=1.5.0=py_0
- setuptools=41.0.1=py36_0
- sina=1.6.0=hc7f9b0f_0
- sip=4.19.8=py36hf484d3e_1000
- snakemake=3.13.3=py36_0
- sortmerna=2.0=he860b03_4
- sqlite=3.29.0=hcee41ef_0
- statsmodels=0.10.1=py36hc1659b7_0
- tbb=2019.7=hc9558a2_0
- tensorboard=1.13.1=py36_0
- tensorflow=1.13.1=py36h90a7d86_1
- tensorflow-estimator=1.13.0=py_0
- termcolor=1.1.0=py_2
- terminado=0.8.2=py36_0
- testpath=0.4.2=py_1001
- tk=8.6.9=hed695b0_1002
- tktable=2.10=h555a92e_1
- tornado=6.0.3=py36h516909a_0
- traitlets=4.3.2=py36_1000
- tzlocal=2.0.0=py_0
- unifrac=0.10.0=py36h6bb024c_1
- urllib3=1.25.3=py36_0
- vsearch=2.7.0=1
- wcwidth=0.1.7=py_1
- webencodings=0.5.1=py_1
- werkzeug=0.15.5=py_0
- wheel=0.33.4=py36_0
- widgetsnbextension=3.5.1=py36_0
- wrapt=1.11.2=py36h7b6447c_0
- xopen=0.7.3=py_0
- xorg-kbproto=1.0.7=h14c3975_1002
- xorg-libice=1.0.10=h516909a_0
- xorg-libsm=1.2.3=h84519dc_1000
- xorg-libx11=1.6.8=h516909a_0
- xorg-libxau=1.0.9=h14c3975_0
- xorg-libxdmcp=1.1.3=h516909a_0
- xorg-libxext=1.3.4=h516909a_0
- xorg-libxrender=0.9.10=h516909a_1002
- xorg-renderproto=0.11.1=h14c3975_1002
- xorg-xextproto=7.3.0=h14c3975_1002
- xorg-xproto=7.0.31=h14c3975_1007
- xz=5.2.4=h14c3975_1001
- yaml=0.1.7=h14c3975_1001
- zeromq=4.3.2=he1b5a44_2
- zipp=0.5.1=py_0
- zlib=1.2.11=h516909a_1005
- zstd=1.4.0=h3b9ef0a_0
- pip:
- deepbinner==0.2.0
- keras==2.2.4
- porechop==0.2.4
- pyyaml==5.1
- scipy==1.2.1
- six==1.12.0
- taxiphy==0.1
prefix: /home/WUR/lanno001/miniconda3/envs/minionmeta
......@@ -12,6 +12,8 @@ parser = argparse.ArgumentParser(description='Collect and store all NCBI resourc
'index.')
parser.add_argument('--in-fasta', type=str, required=True,
help='fasta file containing reads with only accession id as header')
parser.add_argument('--entrez-email', type=str, required=True,
help='Your email address, used by ncbi to warn you if you are hogging resources')
parser.add_argument('--out-dir', type=str, required=True, help='output directory')
args = parser.parse_args()
......@@ -20,7 +22,7 @@ table_fn = f'{out_dir}conversion_table.tsv'
failed_fn = f'{out_dir}reads_failed.txt'
fasta_fn = f'{out_dir}reads.fasta'
Entrez.email = "carlos.delannoy@wur.nl"
Entrez.email = "args.entrez_email"
skip_entry = False
print_threshold = 20
......
......@@ -11,6 +11,8 @@ parser = argparse.ArgumentParser(description='Construct a feature table tsv from
'that is accepted by qiime2.')
parser.add_argument('--in-fasta', type=str, required=True,
help='fasta file containing reads with only accession id as header')
parser.add_argument('--entrez-email', type=str, required=True,
help='Your email address, used by ncbi to warn you if you are hogging resources')
parser.add_argument('--out-dir', type=str, required=True, help='output directory')
args = parser.parse_args()
......@@ -22,7 +24,7 @@ fasta_taxid_fn = f'{out_dir}reads_taxid.fasta'
taxid_fn = f'{out_dir}taxa.txt'
ncbi = NCBITaxa()
Entrez.email = "carlos.delannoy@wur.nl"
Entrez.email = args.entrez_email
saved_ranks = {'superkingdom': 'su__', 'kingdom': 'k__', 'phylum': 'p__', 'class': 'c__',
'order': 'o__', 'family': 'f__', 'genus': 'g__', 'species': 's__'}
......
sample_name sequencing_technique barcode sample_number type
#q2:types categorical categorical categorical categorical
I18-1139-65_S124_L001_001 Miseq None 65 Moneymaker
I18-1139-66_S125_L001_001 Miseq None 66 Moneymaker
I18-1139-67_S126_L001_001 Miseq 03 67 Moneymaker
I18-1139-68_S127_L001_001 Miseq None 68 Moneymaker
I18-1139-69_S128_L001_001 Miseq 04 69 Pimpinellifolium
I18-1139-70_S326_L001_001 Miseq None 70 Pimpinellifolium
I18-1139-71_S327_L001_001 Miseq None 71 Pimpinellifolium
I18-1141-33_S158_L001_001 Miseq None 154 Bulk
I18-1141-34_S159_L001_001 Miseq 05 130 Bulk
I18-1141-35_S160_L001_001 Miseq 06 131 Bulk
I18-1139-65_S118_16S_001 Hiseq None 65 Moneymaker
I18-1139-66_S119_16S_001 Hiseq None 66 Moneymaker
I18-1139-67_S120_16S_001 Hiseq 03 67 Moneymaker
I18-1139-68_S121_16S_001 Hiseq None 68 Moneymaker
I18-1139-69_S122_16S_001 Hiseq 04 69 Pimpinellifolium
I18-1139-70_S320_16S_001 Hiseq None 70 Pimpinellifolium
I18-1139-71_S321_16S_001 Hiseq None 71 Pimpinellifolium
I18-1141-33_S152_16S_001 Hiseq None 129 Bulk
I18-1141-34_S153_16S_001 Hiseq 05 130 Bulk
I18-1141-35_S154_16S_001 Hiseq 06 131 Bulk
I18-1141-36_S161_L001_001 Miseq 07 132 Bulk
I18-1141-37_S162_L001_001 Miseq 08 133 Pimpinellifolium
I18-1141-38_S163_L001_001 Miseq 09 134 Pimpinellifolium
I18-1141-40_S165_L001_001 Miseq 11 136 Pimpinellifolium
I18-1141-37_S156_L002_001 Hiseq 08 133 Pimpinellifolium
I18-1141-40_S159_L002_001 Hiseq 11 136 Pimpinellifolium
I18-1141-38_S157_L001_001 Hiseq 09 134 Pimpinellifolium
I18-1141-36_S155_L001_001 Hiseq 07 132 Bulk
I18-1141-38_S157_L002_001 Hiseq 09 134 Pimpinellifolium
I18-1141-37_S156_L001_001 Hiseq 08 133 Pimpinellifolium
I18-1141-40_S159_L001_001 Hiseq 11 136 Pimpinellifolium
I18-1141-36_S155_L002_001 Hiseq 07 132 Bulk
mock1_Undetermined_S0_L001_001_218 Miseq None None Mock
mock1_Undetermined_S0_L001_001_219 Miseq None None Mock
mock2_Undetermined_S0_L001_001_218 Miseq None None Mock
mock2_Undetermined_S0_L001_001_219 Miseq None None Mock
barcode01 MinION 01 135 Pimpinellifolium
barcode02 MinION 02 66 Moneymaker
barcode03 MinION 03 67 Moneymaker
barcode04 MinION 04 69 Pimpinellifolium
barcode05 MinION 05 130 Bulk
barcode06 MinION 06 131 Bulk
barcode07 MinION 07 132 Bulk
barcode08 MinION 08 133 Pimpinellifolium
barcode09 MinION 09 134 Pimpinellifolium
barcode10 MinION 10 65 Moneymaker
barcode11 MinION 11 136 Pimpinellifolium
barcode12 MinION 12 None Mock
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment