************************************ Structural and Copy Number variants ************************************ Depending on the sequencing type, BALSAMIC is currently running the following structural and copy number variant callers: .. list-table:: SV CNV callers :widths: 25 25 25 25 25 :header-rows: 1 * - Variant caller - Sequencing type - Analysis type - Somatic/Germline - Variant type * - AscatNgs - WGS - tumor-normal - somatic - CNV * - CNVkit - TGA, WES - tumor-normal, tumor-only - somatic - CNV * - Delly - TGA, WES, WGS - tumor-normal, tumor-only - somatic - SV, CNV * - Manta - TGA, WES, WGS - tumor-normal, tumor-only - somatic, germline - SV * - TIDDIT - WGS - tumor-normal, tumor-only - somatic - SV * - CNVpytor - WGS - tumor-only - somatic - CNV Further details about a specific caller can be found in the links for the repositories containing the documentation for SV and CNV callers along with the links for the articles are listed in `bioinfo softwares `_. It is mandatory to provide the gender of the sample from BALSAMIC version >= 10.0.0 For CNV analysis. **Pre-merge Filtrations** ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The copy number variants, identified using ascatNgs and `dellycnv`, are converted to deletion and duplications before they are merged using `SVDB` with `--bnd_distance = 5000` (distance between end points for the variants from different callers) and `--overlap = 0.80` (percentage for overlapping bases for the variants from different callers). Tumor and normal calls in `TIDDIT` are merged using `SVDB` with `--bnd_distance 500` and `--overlap = 0.80`. Using a custom made script "filter_SVs.py", soft-filters are added to the calls based on the presence of the variant in the normal, with the goal of retaining only somatic variants as PASS. .. list-table:: SV filters :widths: 25 25 40 :header-rows: 1 * - Variant caller - Filter added - Filter expression * - TIDDIT - high_normal_af_fraction - (AF_N_MAX / AF_T_MAX) > 0.25 * - TIDDIT - max_normal_allele_frequency - AF_N_MAX > 0.25 * - TIDDIT - normal_variant - AF_T_MAX == 0 and ctg_t == False * - TIDDIT - in_normal - ctg_n == True and AF_N_MAX == 0 and AF_T_MAX <= 0.25 Further information regarding the TIDDIT tumor normal filtration: As translocation variants are represented by 2 BNDs in the VCF which allows for mixed assignment of soft-filters, a requirement for assigning soft-filters to translocations is that neither BND is PASS. **Post-merge Filtrations** ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `SVDB` prioritizes the merging of variants from SV and CNV callers to fetch position and genotype information, in the following order: .. list-table:: SVDB merge caller priority order :widths: 25 25 25 25 :header-rows: 1 * - TGA, WES tumor-normal - TGA, WES tumor-only - WGS tumor-normal - WGS tumor-only * - | 1. manta | 2. dellysv | 3. cnvkit | 4. dellycnv - | 1. manta | 2. dellysv | 3. cnvkit | 4. dellycnv - | 1. manta | 2. dellysv | 3. ascat | 4. dellycnv | 5. tiddit - | 1. manta | 2. dellysv | 3. dellycnv | 4. tiddit | 5. cnvpytor The merged `SNV.somatic..svdb.vcf.gz` file retains all the information for the variants from the caller in which the variants are identified, which are then annotated using `ensembl-vep`. The SweGen and frequencies and the frequency of observed structural variants from clinical normal samples are annotated using `SVDB`. The following filter applies for both tumor-normal and tumor-only samples in addition to caller specific filters. *SWEGENAF*: SweGen Allele Frequency :: SWEGENAF <= 0.02 (or) SWEGENAF == "." *Frq*: Frequency of observation of the variants from normal `Clinical` samples :: Frq <= 0.02 (or) Frq == "." The variants scored as `PASS` or `MaxDepth` are included in the final vcf file (`SNV.somatic..svdb..filtered.pass.vcf.gz`). The following command can be used to fetch the variants identified by a specific caller from merged structural and copy number variants. :: zgrep -E "#|" <*.svdb.vcf.gz> **Genome Reference Files** ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ **How to generate genome reference files for ascatNGS** Detailed information is available from `ascatNGS `_ documentation The file *SnpGcCorrections.tsv* prepared from the 1000 genome SNP panel. **GC correction file:** First step is to download the 1000 genome snp file and convert it from .vcf to .tsv. The detailed procedure to for this step is available from `ascatNGS-reference-files `_ (Human reference files from 1000 genomes VCFs) .. code-block:: export TG_DATA=ftp://ftp.ensembl.org/pub/grch37/release-83/variation/vcf/homo_sapiens/1000GENOMES-phase_3.vcf.gz Followed by: .. code-block:: curl -sSL $TG_DATA | zgrep -F 'E_Multiple_observations' | grep -F 'TSA=SNV' |\ perl -ane 'next if($F[0] !~ m/^\d+$/ && $F[0] !~ m/^[XY]$/);\ next if($F[0] eq $l_c && $F[1]-1000 < $l_p); $F[7]=~m/MAF=([^;]+)/;\ next if($1 < 0.05); printf "%s\t%s\t%d\n", $F[2],$F[0],$F[1];\ $l_c=$F[0]; $l_p=$F[1];' > SnpPositions_GRCh37_1000g.tsv --or-- .. code-block:: curl -sSL $TG_DATA | zgrep -F 'E_Multiple_observations' | grep -F 'TSA=SNV' |\ perl -ane 'next if($F[0] !~ m/^\d+$/ && $F[0] !~ m/^[XY]$/); $F[7]=~m/MAF=([^;]+)/;\ next if($1 < 0.05); next if($F[0] eq $l_c && $F[1]-1000 < $l_p);\ printf "%s\t%s\t%d\n", $F[2],$F[0],$F[1]; $l_c=$F[0]; $l_p=$F[1];'\ > SnpPositions_GRCh37_1000g.tsv Second step is to use *SnpPositions.tsv* file and generate *SnpGcCorrections.tsv* file, more details see `ascatNGS-convert-snppositions `_ .. code-block:: ascatSnpPanelGcCorrections.pl genome.fa SnpPositions.tsv > SnpGcCorrections.tsv **Attention:** **BALSAMIC >= v11.0.0 removes unmapped reads from the bam and cram files for all the workflows.**