Nanopore Quality Control
Overview
This module is designed to function as both a standalone MAG Nanopore quality control pipeline as well as a component of the larger CAMP metagenome analysis pipeline. As such, it is both self-contained (ex. instructions included for the setup of a versioned environment, etc.), and seamlessly compatible with other CAMP modules (ex. ingests and spawns standardized input/output config files, etc.).
The CAMP Nanopore quality control module performs initial QC on raw input reads, including read trimming, read filtering, and removal of human reads.
Installation
Clone repo from Github.
Set up the conda environment using
configs/conda/nanopore-quality-control.yaml.
3. Make sure the installed pipeline works correctly. pytest only generates temporary outputs so no files should be created.
cd camp_nanopore-quality-control
conda create -f configs/conda/nanopore-quality-control.yaml
conda activate nanopore-quality-control
pytest .tests/unit/
Using the Module
Input: /path/to/samples.csv provided by the user.
Output: 1) An output config file summarizing 2) the module’s outputs.
/path/to/work/dir/nanopore-quality-control/final_reports/samples.csvfor ingestion by the next module (ex. quality-checking)
Structure:
└── workflow
├── Snakefile
├── nanopore-quality-control.py
├── utils.py
└── __init__.py
workflow/nanopore-quality-control.py: Click-based CLI that wraps thesnakemakeand unit test generation commands for clean management of parameters, resources, and environment variables.workflow/Snakefile: Thesnakemakepipeline.workflow/utils.py: Sample ingestion and work directory setup functions, and other utility functions used in the pipeline and the CLI.
- Make your own
samples.csvbased on the template inconfigs/samples.csv. Sample test data can be found intest_data/. ingest_samplesinworkflow/utils.pyexpects Illumina reads in FastQ (may be gzipped) form and de novo assembled contigs in FastA formsamples.csvrequires either absolute paths or paths relative to the directory that the module is being run in
- Make your own
Update the relevant parameters in
configs/parameters.yaml.Update the computational resources available to the pipeline in
resources.yaml.- To run CAMP on the command line, use the following, where
/path/to/work/diris replaced with the absolute path of your chosen working directory, and/path/to/samples.csvis replaced with your copy ofsamples.csv. The default number of cores available to Snakemake is 1 which is enough for test data, but should probably be adjusted to 10+ for a real dataset.
Relative or absolute paths to the Snakefile and/or the working directory (if you’re running elsewhere) are accepted!
- To run CAMP on the command line, use the following, where
python /path/to/camp_nanopore-quality-control/workflow/nanopore-quality-control.py \
(-c max_number_of_local_cpu_cores) \
-d /path/to/work/dir \
-s /path/to/samples.csv
Note: This setup allows the main Snakefile to live outside of the work directory.
- To run CAMP on a job submission cluster (for now, only Slurm is supported), use the following.
--slurmis an optional flag that submits all rules in the Snakemake pipeline assbatchjobs.In Slurm mode, the
-cflag refers to the maximum number ofsbatchjobs submitted in parallel, not the pool of cores available to run the jobs. Each job will request the number of cores specified by threads inconfigs/resources/slurm.yaml.
sbatch -J jobname -o jobname.log << "EOF"
#!/bin/bash
python /path/to/camp_nanopore-quality-control/workflow/nanopore-quality-control.py --slurm \
(-c max_number_of_parallel_jobs_submitted) \
-d /path/to/work/dir \
-s /path/to/samples.csv
EOF
Extending the Module
We love to see it! This module was partially envisioned as a dependable, prepackaged sandbox for developers to test their shiny new tools in.
These instructions are meant for developers who have made a tool and want to integrate or demo its functionality as part of the standard Nanopore quality control workflow, or developers who want to integrate an existing tool.
- Write a module rule that wraps your tool and integrates its input and output into the pipeline.
This is a great Snakemake tutorial for writing basic Snakemake rules.
If you’re adding new tools from an existing YAML, use
conda env update --file configs/conda/existing.yaml --prune.If you’re using external scripts and resource files that i) cannot easily be integrated into either utils.py or parameters.yaml, and ii) are not as large as databases that would justify an externally stored download, add them to
workflow/ext/orworkflow/ext/scripts/and userule external_ruleas a template to wrap them.
- Update the
make_configinworkflow/Snakefilerule to check for your tool’s output files. Updatesamples.csvto document its output if downstream modules/tools are meant to ingest it. If you plan to integrate multiple tools into the module that serve the same purpose but with different input or output requirements (ex. for alignment, Minimap2 for Nanopore reads vs. Bowtie2 for Illumina reads), you can toggle between these different ‘streams’ by setting the final files expected by
make_configusing the example functionworkflow_mode.Update the description of the
samples.csvinput fields in the CLI scriptworkflow/nanopore-quality-control.py.
- Update the
- If applicable, update the default conda config using
conda env export > config/conda/nanopore-quality-control.yamlwith your tool and its dependencies. If there are dependency conflicts, make a new conda YAML under
configs/condaand specify its usage in specific rules using thecondaoption (seefirst_rulefor an example).
- If applicable, update the default conda config using
Add your tool’s installation and running instructions to the module documentation and (if applicable) add the repo to your Read the Docs account + turn on the Read the Docs service hook.
- Run the pipeline once through to make sure everything works using the test data in
test_data/if appropriate, or your own appropriately-sized test data. Then, generate unit tests to ensure that others can sanity-check their installations. Note: Python functions imported from
utils.pyintoSnakefileshould be debugged on the command-line first before being added to a rule because Snakemake doesn’t port standard output/error well when usingrun:.
- Run the pipeline once through to make sure everything works using the test data in
python /path/to/camp_nanopore-quality-control/workflow/nanopore-quality-control.py (--unit_test) \
-d /path/to/work/dir \
-s /path/to/samples.csv
6. Increment the version number of the modular pipeline.
bump2version --allow-dirty --commit --tag major workflow/__init__.py \
--current-version A.C.E --new-version B.D.F
- If you want your tool integrated into the main CAMP pipeline, send a pull request and we’ll have a look at it ASAP!
Please make it clear what your tool intends to do by including a summary in the commit/pull request (ex. “Release X.Y.Z: Integration of tool A, which does B to C and outputs D”).
Credits
This package was created with Cookiecutter as a simplified version of the project template.
Free software: MIT License