Skip to content
Snippets Groups Projects
Commit c85cf611 authored by Lensing, Kim's avatar Lensing, Kim
Browse files

Update README.md

parent ef8d639f
Branches
No related tags found
No related merge requests found
...@@ -20,12 +20,7 @@ In the end, in addition to your assembly and variant calling results, you'll als ...@@ -20,12 +20,7 @@ In the end, in addition to your assembly and variant calling results, you'll als
- [MaSuRCA (polca)](https://github.com/alekseyzimin/masurca) - polish assembly - [MaSuRCA (polca)](https://github.com/alekseyzimin/masurca) - polish assembly
- Python - get assembly stats - Python - get assembly stats
- [Minimap2](https://github.com/lh3/minimap2) - map long reads to reference. Genome alignment - [Minimap2](https://github.com/lh3/minimap2) - map long reads to reference. Genome alignment
- [Samtools](http://www.htslib.org/) - sort and index mapped reads and vcf files - R - [pafCoordsDotPlotly](https://github.com/tpoorten/dotPlotly) - plot genome alignment, to compare the new assembly with another genome
- [Longshot](https://github.com/pjedge/longshot) - variant calling with nanopore reads
- [Bwa-mem2](https://github.com/bwa-mem2/bwa-mem2) - map short reads to reference
- [Freebayes](https://github.com/freebayes/freebayes) - variant calling using short reads
- [bcftools](https://samtools.github.io/bcftools/bcftools.html) - vcf statistics
- R - [pafCoordsDotPlotly](https://github.com/tpoorten/dotPlotly) - plot genome alignment
| ![DAG](/workflow.png) | | ![DAG](/workflow.png) |
|:--:| |:--:|
...@@ -68,7 +63,7 @@ MIN_ALIGNMENT_LENGTH: 10000 ...@@ -68,7 +63,7 @@ MIN_ALIGNMENT_LENGTH: 10000
MIN_QUERY_LENGTH: 50000 MIN_QUERY_LENGTH: 50000
``` ```
- LONGREADS - name of file with long reads. This file should be in the working directory (where this config and the Snakefile are) - LONGREADS - name of file with long reads. This file should be in the working directory (where this config and the Snakefile are)
- SHORTREADS - paths to short reads fq.gz - SHORTREADS - paths to short reads fq.gz of sample, used for polishing (if no short reads available leave it empty)
- GENOME_SIZE - approximate genome size ```haploid genome size (bp)(e.g. '3e9' for human genome)``` from [longstitch](https://github.com/bcgsc/longstitch#full-help-page) - GENOME_SIZE - approximate genome size ```haploid genome size (bp)(e.g. '3e9' for human genome)``` from [longstitch](https://github.com/bcgsc/longstitch#full-help-page)
- PREFIX - prefix for the created files - PREFIX - prefix for the created files
- OUTDIR - directory where snakemake will run and where the results will be written to - OUTDIR - directory where snakemake will run and where the results will be written to
...@@ -93,6 +88,10 @@ cat /path/to/fastq/directory/*.fastq > <name of file>.fq ...@@ -93,6 +88,10 @@ cat /path/to/fastq/directory/*.fastq > <name of file>.fq
gzip <name of file>.fq gzip <name of file>.fq
``` ```
If no shortreads of the sample available for polising remove "after" in the rule all:
expand("busco_{prefix}_{busco_cat}_polish_{lineage}/short_summary.specific.{lineage}.busco_{prefix}_{busco_cat}_polish_{lineage}.txt", prefix=PREFIX, lineage=BUSCO_LINEAGE, busco_cat = ["before"]),
#### Run the pipeline on the cluster: #### Run the pipeline on the cluster:
First test the pipeline with a dry run: `snakemake -np`. This will show you the steps and commands that will be executed. Check the commands and file names to see if there’s any mistake. If all looks ok, you can now run your pipeline First test the pipeline with a dry run: `snakemake -np`. This will show you the steps and commands that will be executed. Check the commands and file names to see if there’s any mistake. If all looks ok, you can now run your pipeline
...@@ -109,14 +108,7 @@ The most important files are and directories are: ...@@ -109,14 +108,7 @@ The most important files are and directories are:
- **results** directory that contains - **results** directory that contains
- **{prefix}_oneline.k32.w100.ntLink-arks.longstitch-scaffolds.fa.PolcaCorrected.fa** final assembly - **{prefix}_oneline.k32.w100.ntLink-arks.longstitch-scaffolds.fa.PolcaCorrected.fa** final assembly
- assembly_stats_\<prefix>.txt file with assembly statistics for the final assembly - assembly_stats_\<prefix>.txt file with assembly statistics for the final assembly
- **variant_calling** directory with variant calling VCF files with long and short reads, as well as VCF stats
- {prefix}_shortreads.vcf.gz
- {prefix}_shortreads.vcf.gz.stats
- {prefix}_longreads.vcf.gz
- {prefix}_longreads.vcf.gz.stats
Both the short reads and the long reads variant calling VCFs are filtered for `QUAL > 20`. Freebayes (short read var calling) is ran with parameters `--use-best-n-alleles 4 --min-base-quality 10 --min-alternate-fraction 0.2 --haplotype-length 0 --ploidy 2 --min-alternate-count 2`. For more details check the Snakefile.
- **genome_alignment** directory with results and figure from whole genome alignment - **genome_alignment** directory with results and figure from whole genome alignment
- {prefix}_{species}.png - {prefix}_{species}.png
- **mapped** directory that contains the bam file with long reads mapped to the new assembly - **mapped** directory that contains the bam file with long reads mapped to the new assembly
...@@ -126,3 +118,5 @@ Both the short reads and the long reads variant calling VCFs are filtered for `Q ...@@ -126,3 +118,5 @@ Both the short reads and the long reads variant calling VCFs are filtered for `Q
- short_summary.specific.{lineage}.{prefix}_after_polish.txt" - short_summary.specific.{lineage}.{prefix}_after_polish.txt"
- **other_files** - directory containing other files created during the pipeline - **other_files** - directory containing other files created during the pipeline
- **assembly** - directory containing files created during the assembly step - **assembly** - directory containing files created during the assembly step
If you want to do variant calling in the same pipeline with short and longreads use: https://git.wur.nl/kim.lensing/nanopore-assembly2
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment