diff --git a/docs/source/tutorial/tutorial_part4.rst b/docs/source/tutorial/tutorial_part4.rst index 955e5509449143bdbeed6d498f205f02d211145e..2244653c566c677fe252b1338080f8d74c9ebdb9 100644 --- a/docs/source/tutorial/tutorial_part4.rst +++ b/docs/source/tutorial/tutorial_part4.rst @@ -27,16 +27,6 @@ the ones in the performance test, so don't reuse those here! yeast_panva.tar.gz>`_ [3.4K] - 10 genomes - 4G - * - Bacteria - - `Pectobacterium <https://www.bioinformatics.nl/pangenomics/data/ - pecto_panva.tar.gz>`_ [12.9K] - - 197 genomes, phenotypes - - 18G - * - Plants - - `Arabidopsis <https://www.bioinformatics.nl/pangenomics/data/ - ara_panva.tar.gz>`_ [869M] - - 25 genomes, 30 accessions, phenotypes - - 39G Steps to generate PanVA input @@ -46,7 +36,7 @@ These example cases run through the following steps: 1: Downloading publicly available data * Acquire genome and structural annotation data - * Accession data for arabidopsis, pectobacterium, and yeast + * Accession data for pectobacterium and yeast 2: Preprocessing the data for Pantools * Filtering the minimum sequence size of genomes in the FASTA file * Filtering the minimum ORF size of CDS features in the annotation @@ -67,7 +57,8 @@ These example cases run through the following steps: * Preprocessing data for PanVA * Set up the PanVA instance -This tutorial contains instructions to create a pangenome and PanVA instance for different species. Every package contains a README +This tutorial contains instructions to create a pangenome +and PanVA instance for different species. Every package contains a README with all the exact commands, so make sure to check those if you're stuck. Both Snakemake pipelines used in this workflow create conda environments in the data package directory. If you want to re-use these pipelines for @@ -78,7 +69,7 @@ where the conda environments will be stored. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Goal: * Acquire genome and structural annotation data - * Accession data for arabidopsis, pectobacterium, and yeast + * Accession data for pectobacterium and yeast 1.1: Download the data package ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -95,7 +86,8 @@ RAM-disk/SSD in the configs. 1.2: Download the raw data ~~~~~~~~~~~~~~~~~~~~~~~~~~ -To download corresponding raw-data for each of the above-linked packages, follow the first steps of the instructions laid out in the respective README file. +To download corresponding raw-data for each of the above-linked packages, +follow the first steps of the instructions laid out in the README file. For all packages, those can be found at the root of the decompressed TAR-files. @@ -131,8 +123,10 @@ versions <= 3.11). $ mamba create -c conda-forge -c bioconda -n snakemake snakemake -If you are using an ARM-based machine (such as an M4-based Mac), make sure to make the new environment compatible with Intel-based packages. -Many dependencies in conda are not yet compatible with ARM systems. Consider for example installing *Rosetta 2* +If you are using an ARM-based machine (such as an M4-based Mac), make sure to +make the new environment compatible with Intel-based packages. +Many dependencies in conda are not yet compatible with ARM systems. +Consider for example installing *Rosetta 2* 2.2.2: Activate or create Snakemake *(Silicon based machines)* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -146,20 +140,28 @@ Please use this command to set up your environment: $ CONDA_SUBDIR=osx-64 mamba create -c conda-forge -c bioconda -n snakemake snakemake -This command ensures that packages are downloaded for an Intel-based architecture. Afterwards, restart your shell with the "Open using Rosetta"-setting enabled. -To do this via the GUI, go to "Applications"/Utilities/Terminal" and click on "Get Info". Select the option to start the terminal with Rosetta! +This command ensures that packages are downloaded for an Intel architecture. +Afterwards, restart your shell with the "Open using Rosetta"-setting enabled. +For this, go to "Applications"/Utilities/Terminal" and click on "Get Info". +Select the option to start the terminal with Rosetta! -2.3: Filter the raw data and create functional annotations for extracted protein-sequences -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +2.3: Filter raw data and create functional annotations for protein-sequences +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Filter the raw data and create protein sequences from the root of your data-package: +Filter the raw data and create protein sequences: .. code:: bash $ snakemake --use-conda --snakefile pantools-qc-pipeline/workflow/Snakefile --configfile config/<target-dataset>_qc.yaml --cores <threads> -3 & 4: Constructing and annotating a pangenome using Pantools & running the necessary analysis steps in order to create a PanVA instance -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +3: Constructing and annotating a pangenome using Pantools +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The + +4: Running the necessary analysis steps in order to create a PanVA instance +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Goal (3, pangenome construction): * Build the pangenome * Add structural annotations, functional annotations and phenotypes @@ -186,13 +188,13 @@ above. 3.2: Run PanTools to generate a pangenome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -All analyses to create a complete pangenome happen together with those analyses specific for PanVA. -Those are started with the same command, outlined below. +All analyses to create a complete pangenome happen together with those analyses +specific for PanVA. Those are started with the same command, outlined below. 4.1: Run PanTools to for PanVA-specific analyses ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -The snakemake rule PanVA *(panva)* runs all Pantools-functions to create a complete PanVA instance. +The snakemake rule panva runs all functions to create a PanVA instance. This step covers therefore both step 3 and step 4 in one command. .. code:: bash @@ -219,7 +221,8 @@ the proper format for PanVA. 5.2: Create a conda environment for the script ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Make sure to create an environment that can deal with Intel-based dependencies if you are on a silicon-based Mac. +Make sure to create an environment that can deal with +Intel-based dependencies if you are on a silicon-based Mac. .. code:: bash