Pileup Module¶

Welcome to the technical documentation for the pileup module. This page aims to be a detailed documentation of each rule within the module by stating its inputs, outputs (and how they relate to other rules), configurable parameters, and the software used. Moreover, when needed, there will be explanations and examples of what that particular rule does.

The schema below is a visual representation of the individual module steps and how they are related.

Third-party software used¶

Tag lines were taken from the developers' websites (code repository or manual)

Name	License	Tag line	More info
ASCII-style alignment pileups	Apache 2.0	"Generates ASCII-style pileups of read alignments in one or more BAM files for one or more genomic regions."	code
SAMtools	MIT	"[...] suite of programs for interacting with high-throughput sequencing data"	code / manual / publication

Configuration file¶

Some parameters within the workflow can be modified. Refer to the configuration template for a detailed explanation of each option.

Pileup Workflow¶

`finish_pileup`¶

Target rule as required by Snakemake.

Local rule

Input

(Workflow output) Empty text file (.txt) create_per_library_ascii_pileups and create_per_run_ascii_pileups

`create_empty_bed`¶

Create an empty BED file if the user has not provided one.

OPTIONAL RULE. This rule will be executed if, and only if, the user has not provided a BED file in the configuration file with the regions the ASCII-style alignment pileups must be performed on.

ConditionOutput

config_template.yaml
- bed_file: BED6 file with all the desired annotation regions to perform the ASCII-style alignment pileups on. (Default: None)

Empty BED file (.bed); used in create_per_library_ascii_pileups, create_per_run_ascii_pileups and/or create_per_condition_ascii_pileups

`compress_reference_genome`¶

Compress the processed genome with trimmed IDs using bgzip with SAMtools.

Required to perform the ASCII-style alignment pileups.

InputOutput

Genome sequence, trimmed IDs (.fa); from trim_genome_seq_ids

Genome sequence, trimmed IDs, bgziped (.fa.bz); used in create_per_library_ascii_pileups, create_per_run_ascii_pileups and/or create_per_condition_ascii_pileups

`create_per_library_ascii_pileups`¶

Create ASCII-style pileups for all the desired annotated regions across libraries with ASCII-style alignment pileups.

A directory containing the ASCII-style pileups is created for each library. If no BED file is provided, the pileups' output directories will only contain an empty file.

InputOutput

Genome sequence, trimmed IDs, bgziped (.fa.bz); from compress_reference_genome
miRNA annotations, mapped chromosome name(s) (.gff3); from map_chr_names
(Workflow output) Alignments file, uncollapsed, sorted (.bam); from sort_uncollapsed_reads_bam_by_position
(Workflow output) BAM index file (.bam.bai); used in index_uncollapsed_reads_bam
Annotated genomic regions (.bed); from workflow input files or create_empty_bed

(Workflow output) Empty text file (.txt)

`create_per_run_ascii_pileups`¶

Create ASCII-style pileups for all the desired annotated regions for the whole run with ASCII-style alignment pileups.

If no BED file is provided, the pileups' output directory will only contain an empty file.

InputOutput

Genome sequence, trimmed IDs, bgziped (.fa.bz); from compress_reference_genome
miRNA annotations, mapped chromosome name(s) (.gff3); from map_chr_names
(Workflow output) Alignments file, uncollapsed, sorted (.bam); from sort_uncollapsed_reads_bam_by_position
(Workflow output) BAM index file (.bam.bai); used in index_uncollapsed_reads_bam
Annotated genomic regions (.bed); from workflow input files or create_empty_bed

(Workflow output) Empty text file (.txt)

`create_per_condition_ascii_pileups`¶

Create ASCII-style pileups for all the desired annotated regions across the different library subsets if provided with ASCII-style alignment pileups.

OPTIONAL RULE. The ASCII-style pileups for each annotated region are made if, and only if, at least one library subset is specified in the configuration file. Otherwise, this rule will not be executed, and no output will be generated.

InputParametersOutput

Genome sequence, trimmed IDs, bgziped (.fa.bz); from compress_reference_genome
miRNA annotations, mapped chromosome name(s) (.gff3); from map_chr_names
(Workflow output) Alignments file, uncollapsed, sorted (.bam); from sort_uncollapsed_reads_bam_by_position
(Workflow output) BAM index file (.bam.bai); used in index_uncollapsed_reads_bam
Annotated genomic regions (.bed); from workflow input files or create_empty_bed

config_template.yaml
- lib_dict: Dictionary of arbitrary condition names (keys) and library names to aggregate alignment pileups for (values; MUST correspond to names in samples table) (default: None)

Empty text file (.txt)

Pileup Module¶

Third-party software used¶

Configuration file¶

Pileup Workflow¶

finish_pileup¶

create_empty_bed¶

compress_reference_genome¶

create_per_library_ascii_pileups¶

create_per_run_ascii_pileups¶

create_per_condition_ascii_pileups¶

`finish_pileup`¶

`create_empty_bed`¶

`compress_reference_genome`¶

`create_per_library_ascii_pileups`¶

`create_per_run_ascii_pileups`¶

`create_per_condition_ascii_pileups`¶