QuaC

🦆🦆 Don't duck that QC thingy 🦆🦆

Note

In a past life, QuaC used a different remote Git management provider, UAB Gitlab. It was migrated to Github in Jan 2023, and the Gitlab version has been archived.

What is QuaC?

QuaC is a snakemake-based pipeline that runs several QC tools for WGS/WES samples and then summarizes their results using pre-defined, configurable QC thresholds.

In summary, QuaC performs the following:

  • Runs several QC tools using BAM and VCF files as input. At our center CGDS, these files are produced as part of the small variant caller pipeline.
  • Using QuaC-Watch tool, it performs QC checkup based on the expected thresholds for certain QC metrics and summarizes the results for easier human consumption
  • Aggregates QC output as well as QuaC-Watch output using MulitQC, both at the sample level and project level.
  • Optionally, above mentioned QuaC-Watch and QC aggregation steps can accept pre-run results from few QC tools (fastqc, fastq-screen, picard's markduplicates) when run with flag --include_prior_qc.

CGDS users only

  • At CGDS, BAM and VCF files produced by the small variant caller pipeline are used as input to QuaC.
  • Tools fastqc, fastq-screen, and picard's markduplicates, whose output are accepted by QuaC when used with flag --include_prior_qc, are produced by this small_variant_caller_pipeline.

Info

QuaC is built to use with Human WGS/WES data. If you would like to use it with non-human data, please modify the pipeline as needed -- especially the thresholds used in QuaC-Watch configs.

QC tools

Tools run by QuaC

QuaC quacks using the tools listed below:

Tool Use QC Type
Qualimap Summarizes several alignment metrics using BAM file BAM quality
Picard-CollectMultipleMetrics Summarizes alignment metrics from BAM file using several modules BAM quality
Picard-CollectWgsMetrics Collects metrics about coverage and performance using BAM file BAM quality
mosdepth Fast alignment depth calculation using BAM file BAM quality
indexcov Estimate coverage from BAM index for GS
(Skipped in exome mode)
BAM quality
covviz Identifies large, coverage-based anomalies for GS using Indexcov output
(Skipped in exome mode)
BAM quality
bcftools stats Summarizes VCF file stats VCF quality
verifybamid Estimates within-species (i.e., cross-sample) contamination using BAM file Within-species contamination
somalier Estimation of sex, ancestry and relatedness using BAM file Sex, ancestry and relatedness estimation

Optional QC output consumed by QuaC

Optionally QuaC can also utilize QC results produced by the tools listed below when run with flag --include_prior_qc.

Tool Use QC Type
fastqc Performs QC on raw sequence reads data (FASTQ) FASTQ quality
FastQ Screen Screens FASTQ for other-species contamination FASTQ quality
Picard's MarkDuplicates Determines level of read duplication on BAM files BAM quality

CGDS users only

  • At CGDS, these optional tools were run by our small_variant_caller_pipeline.

Citing QuaC

If you use QuaC, please cite:

Gajapathy et al., (2023). QuaC: A Pipeline Implementing Quality Control Best Practices for Genome Sequencing and Exome Sequencing Data. Journal of Open Source Software, 8(90), 5313, https://doi.org/10.21105/joss.05313