TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

google / deepsomatic

DeepSomatic is an analysis pipeline that uses a deep neural network to call somatic variants from tumor-normal and tumor-only sequencing data.

297 46 Language: null License: BSD-3-Clause Updated: 10d ago

README

DeepSomatic

release
announcements
blog

DeepSomatic is an extension of deep learning-based variant caller
DeepVariant that takes aligned reads
(in BAM or CRAM format) from tumor and normal data, produces pileup image
tensors from them, classifies each tensor using a convolutional neural network,
and finally reports somatic variants in a standard VCF or gVCF file.

DeepSomatic supports somatic variant-calling from tumor-normal and tumor-only
sequencing data.

Code availability

DeepSomatic is integrated with
DeepVariant to utilize the high-quality
end-to-end testing and feature development of DeepVariant.

Here are the scripts that describe the core components of DeepSomatic:

Integrating DeepSomatic within DeepVariant helps to maintain
high-quality code health with integrated testing and feature development.

Case studies

The following case studies show example runs for supported technologies:

Tumor-normal case-studies

Tumor-only case-studies

For details around runtime and accuracy expectations, please see the DeepSomatic metrics page.

How to Cite

If you use DeepSomatic in your work, please cite:

DeepSomatic: Accurate somatic small variant discovery for multiple sequencing technologies

How to run DeepSomatic

sudo docker run \
-v ${INPUT_DIR}:${INPUT_DIR} \
-v ${OUTPUT_DIR}:${OUTPUT_DIR} \
google/deepsomatic:"${BIN_VERSION}" \
run_deepsomatic \
--model_type=WGS \ ** Can be WGS,WES,PACBIO,ONT,FFPE_WGS,FFPE_WES,WGS_TUMOR_ONLY,PACBIO_TUMOR_ONLY,ONT_TUMOR_ONLY **
--ref=${INPUT_DIR}/REF.fasta \ **Path to reference fasta file.
--reads_normal=${INPUT_DIR}/normal.bam \ **Path to normal bam file.
--reads_tumor=${INPUT_DIR}/tumor.bam \ * Path to tumor bam file.
--output_vcf=${OUTPUT_DIR}/OUTPUT.vcf.gz \ **Path to output VCF file.
--output_gvcf=${OUTPUT_DIR}/OUTPUT.g.vcf.gz \ **Path to output gVCF file.
--sample_name_tumor="tumor" \
--sample_name_normal="normal" \
--num_shards=$(nproc) \ **Total number of threads to use.
--logging_dir=${OUTPUT_DIR}/logs \ **Log output directory.
--intermediate_results_dir ${OUTPUT_DIR}/intermediate_results_dir \
--regions=chr1 \ **Region of the genome, if not provided then runs on whole genome
--use_default_pon_filtering=false \ **Set to true for default PON filtering for tumor-only variant calling**
--dry_run=false **Default is false. If set to true, commands will be printed out but not executed.

Please follow the Quick Start for more
details on different setups like Docker and Singuarity. available for
DeepSomatic

Example output

DeepSomatic utilizes FILTER in VCF format to report identified germline and
somatic variants. The description of the filters can be found in the header:

##FILTER=<ID=PASS,Description="All filters passed">
##FILTER=<ID=RefCall,Description="Genotyping model thinks this site is reference.">
##FILTER=<ID=LowQual,Description="Confidence in this variant being real is below calling threshold.">
##FILTER=<ID=NoCall,Description="Site has depth=0 resulting in no call.">
##FILTER=<ID=GERMLINE,Description="Non somatic variants">

For example, the variants reported below:

# CHROM POS     ID  REF ALT QUAL    FILTER      INFO    FORMAT              SAMPLE_NAME
chr1    14001   .   A   G   3.7     GERMLINE    .       GT:GQ:DP:AD:VAF:PL  0/0:4:8:4,4:0.5:1,0,34
chr1    14002   .   T   A   0       RefCall     .       GT:GQ:DP:AD:VAF:PL  0/0:51:60:57,2:0.0333333:0,51,58
chr1    14003   .   C   G   43.8    PASS        .       GT:GQ:DP:AD:VAF:PL  1/1:43:74:0,74:1:43,52,0

In this example:

  • The variant with GERMLINE FILTER status is identified as a germline variant
  • The variant with RefCall FILTER status is homozygous to the reference
  • The variant with PASS FILTER status is a somatic variant.

Prerequisites

  • Unix-like operating system (cannot run on Windows)
  • Python 3.10

Contribution Guidelines

Please open a pull request if
you wish to contribute to DeepSomatic. Note, we have not set up the
infrastructure to merge pull requests externally. If you agree, we will test and
submit the changes internally and mention your contributions in our
release notes. We apologize
for any inconvenience.

If you have any difficulty using DeepSomatic, feel free to
open an issue. If you have
general questions not specific to DeepSomatic, we recommend that you post on a
community discussion forum such as BioStars.

License

BSD-3-Clause license

Disclaimer

This is not an official Google product.

NOTE: the content of this research code repository (i) is not intended to be a
medical device; and (ii) is not intended for clinical use of any kind, including
but not limited to diagnosis or prognosis.

0 AIs selected
Clear selection
#
Name
Task