BALSAMIC.utils package

Submodules

BALSAMIC.utils.cli module

class BALSAMIC.utils.cli.CaptureStdout[source]

Bases: list

Captures stdout.

class BALSAMIC.utils.cli.SnakeMake[source]

Bases: object

To build a snakemake command using cli options

Params:

case_name - analysis case name working_dir - working directory for snakemake configfile - sample configuration file (json) output of balsamic-config-sample run_mode - run mode - cluster or local shell run cluster_config - cluster config json file scheduler - slurm command constructor log_path - log file path script_path - file path for slurm scripts result_path - result directory qos - QOS for sbatch jobs account - scheduler(e.g. slurm) account mail_user - email to account to send job run status forceall - To add ‘–forceall’ option for snakemake run_analysis - To run pipeline use_singularity - To use singularity singularity_bind- Singularity bind path singularity_arg - Singularity arguments to pass to snakemake sm_opt - snakemake additional options

build_cmd()[source]
BALSAMIC.utils.cli.add_doc(docstring)[source]

A decorator for adding docstring. Taken shamelessly from stackexchange.

BALSAMIC.utils.cli.convert_defaultdict_to_regular_dict(inputdict: dict)[source]

Recursively convert defaultdict to dict.

BALSAMIC.utils.cli.createDir(path, interm_path=[])[source]

Creates directories by recursively checking if it exists, otherwise increments the number

Creates symlinks for provided files in analysis/fastq directory. Identifies file prefix pattern, and also creates symlinks for the second read file, if needed

BALSAMIC.utils.cli.find_file_index(file_path)[source]
BALSAMIC.utils.cli.generate_graph(config_collection_dict, config_path)[source]

Generate DAG graph using snakemake stdout output

BALSAMIC.utils.cli.get_bioinfo_tools_list(conda_env_path) → dict[source]

Parses the names and versions of bioinfo tools used by BALSAMIC from config YAML into a dict

BALSAMIC.utils.cli.get_config(config_name)[source]

Return a string path for config file.

BALSAMIC.utils.cli.get_fastq_bind_path(fastq_path: pathlib.Path) → [][source]

Takes a path with symlinked fastq files. Returns unique paths to parent directories for singulatiry bind

BALSAMIC.utils.cli.get_file_extension(file_path)[source]
BALSAMIC.utils.cli.get_file_status_string(file_to_check)[source]

Checks if file exsits. and returns a string with checkmark or redcorss mark if it exists or doesn’t exist respectively. Always assume file doesn’t exist, unless proven otherwise.

BALSAMIC.utils.cli.get_from_two_key(input_dict, from_key, by_key, by_value, default=None)[source]

Given two keys with list of values of same length, find matching index of by_value in from_key from by_key.

from_key and by_key should both exist

BALSAMIC.utils.cli.get_panel_chrom(panel_bed) → list[source]

Returns a set of chromosomes present in PANEL BED

BALSAMIC.utils.cli.get_sample_dict(tumor, normal) → dict[source]

Concatenates sample dicts for all provided files

BALSAMIC.utils.cli.get_sample_names(filename, sample_type)[source]

Creates a dict with sample prefix, sample type, and readpair suffix

BALSAMIC.utils.cli.get_schedulerpy()[source]

Returns a string path for scheduler.py

BALSAMIC.utils.cli.get_snakefile(analysis_type, sequencing_type='targeted')[source]

Return a string path for variant calling snakefile.

BALSAMIC.utils.cli.iterdict(dic)[source]

dictionary iteration - returns generator

BALSAMIC.utils.cli.merge_dict_on_key(dict_1, dict_2, by_key)[source]

Merge two list of dictionaries based on key

BALSAMIC.utils.cli.merge_json(*args)[source]

Take a list of json files and merges them together Input: list of json file Output: dictionary of merged json

BALSAMIC.utils.cli.recursive_default_dict()[source]

Recursivly create defaultdict.

BALSAMIC.utils.cli.singularity(sif_path: str, cmd: str, bind_paths: list) → str[source]

Run within container

Excutes input command string via Singularity container image

Parameters:
  • sif_path – Path to singularity image file (sif)
  • cmd – A string for series of commands to run
  • bind_path – a path to bind within container
Returns:

A sanitized Singularity cmd

Raises:

BalsamicError – An error occured while creating cmd

BALSAMIC.utils.cli.validate_fastq_pattern(sample)[source]

Finds the correct filename prefix from file path, and returns it. An error is raised if sample name has invalid pattern

BALSAMIC.utils.cli.write_json(json_out, output_config)[source]

BALSAMIC.utils.exc module

exception BALSAMIC.utils.exc.BalsamicError(message)[source]

Bases: Exception

Base exception for the BALSAMIC.

BALSAMIC.utils.models module

class BALSAMIC.utils.models.AnalysisModel[source]

Bases: pydantic.main.BaseModel

Pydantic model containing workflow variables

case_id

Field(required); string case identifier

analysis_type

Field(required); string literal [single, paired] single : if only tumor samples are provided paired : if both tumor and normal samples are provided

sequencing_type

Field(required); string literal [targeted, wgs] targeted : if capture kit was used to enrich specific genomic regions wgs : if whole genome sequencing was performed

analysis_dir

Field(required); existing path where to save files

fastq_path

Field(optional); Path where fastq files will be stored

script

Field(optional); Path where snakemake scripts will be stored

log

Field(optional); Path where logs will be saved

result

Field(optional); Path where BALSAMIC output will be stored

benchmark

Field(optional); Path where benchmark report will be stored

dag

Field(optional); Path where DAG graph of workflow will be stored

BALSAMIC_version

Field(optional); Current version of BALSAMIC

config_creation_date

Field(optional); Timestamp when config was created

Raises:ValueError – When analysis_type is set to any value other than [single, paired, qc] When sequencing_type is set to any value other than [wgs, targeted]
class Config[source]

Bases: object

validate_all = True
classmethod analysis_type_literal(value) → str[source]
classmethod datetime_as_string(value)[source]
classmethod dirpath_always_abspath(value) → str[source]
classmethod parse_analysis_to_benchmark_path(value, values, **kwargs) → str[source]
classmethod parse_analysis_to_dag_path(value, values, **kwargs) → str[source]
classmethod parse_analysis_to_fastq_path(value, values, **kwargs) → str[source]
classmethod parse_analysis_to_log_path(value, values, **kwargs) → str[source]
classmethod parse_analysis_to_result_path(value, values, **kwargs) → str[source]
classmethod parse_analysis_to_script_path(value, values, **kwargs) → str[source]
classmethod sequencing_type_literal(value) → str[source]
class BALSAMIC.utils.models.BalsamicConfigModel[source]

Bases: pydantic.main.BaseModel

Summarizes config models in preparation for export

QC

Field(QCmodel); variables relevant for fastq preprocessing and QC

vcf

Field(VCFmodel); variables relevand for variant calling pipeline

samples

Field(Dict); dictionary containing samples submitted for analysis

reference

Field(Dict); dictionary containign paths to reference genome files

panel

Field(PanelModel(optional)); variables relevant to PANEL BED if capture kit is used

bioinfo_tools

Field(BioinfoToolsModel); dictionary of bioinformatics software and their versions used for the analysis

singularity

Field(Path); path to singularity container of BALSAMIC

conda_env_yaml

Field(Path(CONVA_ENV_YAML)); path where Balsamic configs can be found

rule_directory

Field(Path(RULE_DIRECTORY)); path where snakemake rules can be found

classmethod abspath_as_str(value)[source]
classmethod transform_path_to_dict(value)[source]
class BALSAMIC.utils.models.BioinfoToolsModel[source]

Bases: pydantic.main.BaseModel

Holds versions of current bioinformatic tools used in analysis

class BALSAMIC.utils.models.PanelModel[source]

Bases: pydantic.main.BaseModel

Holds attributes of PANEL BED file if provided .. attribute:: capture_kit

Field(str(Path)); string representation of path to PANEL BED file
chrom

Field(list(str)); list of chromosomes in PANEL BED

Raises:ValueError – When capture_kit argument is set, but is not a valid path
classmethod path_as_abspath_str(value)[source]
class BALSAMIC.utils.models.QCModel[source]

Bases: pydantic.main.BaseModel

Contains settings for quality control and pre-processing .. attribute:: picard_rmdup

Field(bool); whether duplicate removal is to be applied in the workflow
adapter

Field(str(AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT)); adapter sequence to trim

quality_trim

Field(bool); whether quality trimming it to be performed in the workflow

adapter_trim

Field(bool); whether adapter trimming is to be performed in the workflow

umi_trim

Field(bool); whether UMI trimming is to be performed in the workflow

min_seq_length

Field(str(int)); minimum sequence length cutoff for reads

umi_trim_length

Field(str(int)); length of UMI to be trimmed from reads

Raises:ValueError – When the input in min_seq_length and umi_trim_length cannot be interpreted as integer and coerced to string
class Config[source]

Bases: object

validate_all = True
classmethod coerce_int_as_str(value)[source]
class BALSAMIC.utils.models.SampleInstanceModel[source]

Bases: pydantic.main.BaseModel

Holds attributes for samples used in analysis

file_prefix

Field(str); basename of sample pair

sample_type

Field(str; alias=type); type of sample [tumor, normal]

readpair_suffix

Field(List); currently always set to [1, 2]

Raises:ValueError – When sample_type is set ot any value other than [tumor, normal]
classmethod sample_type_literal(value)[source]
class BALSAMIC.utils.models.VCFAttributes[source]

Bases: pydantic.main.BaseModel

General purpose filter to manage various VCF attributes

This class handles three parameters for the purpose filtering variants based on a tag_values, filter_name, and which field in VCF.

E.g. AD=VCFAttributes(tag_value=5, filter_name=”balsamic_low_tumor_ad”, field=”INFO”) A value of 5 from INFO field and filter_name will be balsamic_low_tumor_ad

tag_value

float

filter_name

str

field

str

class BALSAMIC.utils.models.VCFModel[source]

Bases: pydantic.main.BaseModel

Contains VCF config

class BALSAMIC.utils.models.VarCallerFilter[source]

Bases: pydantic.main.BaseModel

General purpose for variant caller filters

This class handles attributes and filter for variant callers

AD

VCFAttributes (required); minimum allelic depth

AF_min

VCFAttributes (optional); minimum allelic fraction

AF_max

VCFAttributes (optional); maximum allelic fraction

MQ

VCFAttributes (optional); minimum mapping quality

DP

VCFAttributes (optional); minimum read depth

varcaller_name

str (required); variant caller name

filter_type

str (required); filter name for variant caller

analysis_type

str (required); analysis type e.g. tumor_normal or tumor_only

description

str (required); comment section for description

class BALSAMIC.utils.models.VarcallerAttribute[source]

Bases: pydantic.main.BaseModel

Holds variables for variant caller software .. attribute:: mutation

mutation_type
Raises:ValueError – When a variable other than [somatic, germline] is passed in mutation field When a variable other than [SNV, CNV, SV] is passed in mutation_type field
classmethod mutation_literal(value) → str[source]
classmethod mutation_type_literal(value) → str[source]

BALSAMIC.utils.rule module

BALSAMIC.utils.rule.get_chrom(panelfile)[source]

input: a panel bedfile output: list of chromosomes in the bedfile

BALSAMIC.utils.rule.get_conda_env(yaml_file, pkg)[source]

Retrieve conda environment for package from a predefined yaml file

input: balsamic_env output: string of conda env where packge is in

BALSAMIC.utils.rule.get_picard_mrkdup(config)[source]

input: sample config file output from BALSAMIC output: mrkdup or rmdup strings

BALSAMIC.utils.rule.get_result_dir(config)[source]

input: sample config file from BALSAMIC output: string of result directory path

BALSAMIC.utils.rule.get_rule_output(rules)[source]

get list of existing output files from a given workflow

Parameters:
  • rule_names – a list of rule names in the workflow. If no rules are given, then it will get all rules from the workflow.
  • rules – snakemake rules object
Returns:

list of tuples (file_name, rule_name, wildcard) for rules

Return type:

output_files

BALSAMIC.utils.rule.get_rule_output_raw(rules, output_file_wildcards={})[source]

get list of all possible output files from a given workflow

Parameters:
  • rules – snakemake rules object
  • output_file_wildcards – a dictionary with wildcards as keys and values as list of wildcard values
Returns:

list of tuples (file_name, rule_name, wildcard) for rules

Return type:

output_files

BALSAMIC.utils.rule.get_sample_type(sample, bio_type)[source]

input: sample dictionary from BALSAMIC’s config file output: list of sample type id

BALSAMIC.utils.rule.get_script_path(script_name: str)[source]

Retrieves script path where name is matching {{script_name}}.

BALSAMIC.utils.rule.get_threads(cluster_config, rule_name='__default__')[source]

To retrieve threads from cluster config or return default value of 8

BALSAMIC.utils.rule.get_vcf(config, var_caller, sample)[source]

input: BALSAMIC config file output: retrieve list of vcf files

Module contents