QuaC
🦆🦆 Don't duck that QC thingy 🦆🦆
Note
In a past life, QuaC used a different remote Git management provider, UAB Gitlab. It was migrated to Github in Jan 2023, and the Gitlab version has been archived.
What is QuaC?
QuaC is a snakemake-based pipeline that runs several QC tools for WGS/WES samples and then summarizes their results using pre-defined, configurable QC thresholds.
In summary, QuaC performs the following:
- Runs several QC tools using
BAM
andVCF
files as input. At our center CGDS, these files are produced as part of the small variant caller pipeline. - Using QuaC-Watch tool, it performs QC checkup based on the expected thresholds for certain QC metrics and summarizes the results for easier human consumption
- Aggregates QC output as well as QuaC-Watch output using MulitQC, both at the sample level and project level.
- Optionally, above mentioned QuaC-Watch and QC aggregation steps can accept pre-run results from few QC tools (fastqc,
fastq-screen, picard's markduplicates) when run with flag
--include_prior_qc
.
CGDS users only
- At CGDS, BAM and VCF files produced by the small variant caller pipeline are used as input to QuaC.
- Tools fastqc, fastq-screen, and picard's markduplicates, whose output are accepted by QuaC when used with
flag
--include_prior_qc
, are produced by this small_variant_caller_pipeline.
Info
QuaC is built to use with Human WGS/WES data. If you would like to use it with non-human data, please modify the pipeline as needed -- especially the thresholds used in QuaC-Watch configs.
QC tools
Tools run by QuaC
QuaC quacks using the tools listed below:
Tool | Use | QC Type |
---|---|---|
Qualimap | Summarizes several alignment metrics using BAM file | BAM quality |
Picard-CollectMultipleMetrics | Summarizes alignment metrics from BAM file using several modules | BAM quality |
Picard-CollectWgsMetrics | Collects metrics about coverage and performance using BAM file | BAM quality |
mosdepth | Fast alignment depth calculation using BAM file | BAM quality |
indexcov | Estimate coverage from BAM index for GS (Skipped in exome mode) |
BAM quality |
covviz | Identifies large, coverage-based anomalies for GS using Indexcov output (Skipped in exome mode) |
BAM quality |
bcftools stats | Summarizes VCF file stats | VCF quality |
verifybamid | Estimates within-species (i.e., cross-sample) contamination using BAM file | Within-species contamination |
somalier | Estimation of sex, ancestry and relatedness using BAM file | Sex, ancestry and relatedness estimation |
Optional QC output consumed by QuaC
Optionally QuaC can also utilize QC results produced by the tools listed below when run with flag --include_prior_qc
.
Tool | Use | QC Type |
---|---|---|
fastqc | Performs QC on raw sequence reads data (FASTQ) | FASTQ quality |
FastQ Screen | Screens FASTQ for other-species contamination | FASTQ quality |
Picard's MarkDuplicates | Determines level of read duplication on BAM files | BAM quality |
CGDS users only
- At CGDS, these optional tools were run by our small_variant_caller_pipeline.
Citing QuaC
If you use QuaC, please cite:
Gajapathy et al., (2023). QuaC: A Pipeline Implementing Quality Control Best Practices for Genome Sequencing and Exome Sequencing Data. Journal of Open Source Software, 8(90), 5313, https://doi.org/10.21105/joss.05313