Update README.md

c85cf611 · Lensing, Kim · ef8d639f · c85cf611
Commit c85cf611 authored 10 months ago by Lensing, Kim
--- a/README.md
+++ b/README.md
@@ -20,12 +20,7 @@ In the end, in addition to your assembly and variant calling results, you'll als
 - [MaSuRCA (polca)](https://github.com/alekseyzimin/masurca) - polish assembly
 - Python - get assembly stats
 - [Minimap2](https://github.com/lh3/minimap2) - map long reads to reference. Genome alignment
- [Samtools](http://www.htslib.org/) - sort and index mapped reads and vcf files
+- R - [pafCoordsDotPlotly](https://github.com/tpoorten/dotPlotly) - plot genome alignment, to compare the new assembly with another genome
- [Longshot](https://github.com/pjedge/longshot) - variant calling with nanopore reads
- [Bwa-mem2](https://github.com/bwa-mem2/bwa-mem2) - map short reads to reference
- [Freebayes](https://github.com/freebayes/freebayes) - variant calling using short reads
- [bcftools](https://samtools.github.io/bcftools/bcftools.html) - vcf statistics
- R - [pafCoordsDotPlotly](https://github.com/tpoorten/dotPlotly) - plot genome alignment 
 | ![DAG](/workflow.png) |
 |:--:|
@@ -68,7 +63,7 @@ MIN_ALIGNMENT_LENGTH: 10000
 MIN_QUERY_LENGTH: 50000
 ```
 - LONGREADS - name of file with long reads. This file should be in the working directory (where this config and the Snakefile are)
- SHORTREADS - paths to short reads fq.gz
+- SHORTREADS - paths to short reads fq.gz of sample, used for polishing (if no short reads available leave it empty)
 - GENOME_SIZE - approximate genome size ```haploid genome size (bp)(e.g. '3e9' for human genome)``` from [longstitch](https://github.com/bcgsc/longstitch#full-help-page)
 - PREFIX -  prefix for the created files
 - OUTDIR - directory where snakemake will run and where the results will be written to  
@@ -93,6 +88,10 @@ cat /path/to/fastq/directory/*.fastq > <name of file>.fq
 gzip <name of file>.fq
 ```
+If no shortreads of the sample available for polising remove "after"  in the rule all:
+        expand("busco_{prefix}_{busco_cat}_polish_{lineage}/short_summary.specific.{lineage}.busco_{prefix}_{busco_cat}_polish_{lineage}.txt", prefix=PREFIX, lineage=BUSCO_LINEAGE, busco_cat = ["before"]),
 #### Run the pipeline on the cluster:
 First test the pipeline with a dry run: `snakemake -np`. This will show you the steps and commands that will be executed. Check the commands and file names to see if there’s any mistake. If all looks ok, you can now run your pipeline
@@ -109,14 +108,7 @@ The most important files are and directories are:
 - **results** directory that contains
  - **{prefix}_oneline.k32.w100.ntLink-arks.longstitch-scaffolds.fa.PolcaCorrected.fa** final assembly
  - assembly_stats_\<prefix>.txt file with assembly statistics for the final assembly
-  - **variant_calling** directory with variant calling VCF files with long and short reads, as well as VCF stats
-    - {prefix}_shortreads.vcf.gz
-    - {prefix}_shortreads.vcf.gz.stats
-    - {prefix}_longreads.vcf.gz
-    - {prefix}_longreads.vcf.gz.stats
-Both the short reads and the long reads variant calling VCFs are filtered for `QUAL > 20`. Freebayes (short read var calling) is ran with parameters `--use-best-n-alleles 4 --min-base-quality 10 --min-alternate-fraction 0.2 --haplotype-length 0 --ploidy 2 --min-alternate-count 2`. For more details check the Snakefile.
  - **genome_alignment** directory with results and figure from whole genome alignment
    - {prefix}_{species}.png 
 - **mapped** directory that contains the bam file with long reads mapped to the new assembly
@@ -126,3 +118,5 @@ Both the short reads and the long reads variant calling VCFs are filtered for `Q
  - short_summary.specific.{lineage}.{prefix}_after_polish.txt"
 -  **other_files** - directory containing other files created during the pipeline
 -  **assembly** - directory containing files created during the assembly step
+If you want to do variant calling in the same pipeline with short and longreads use: https://git.wur.nl/kim.lensing/nanopore-assembly2