Pileup Module¶
Welcome to the technical documentation for the pileup module. This page aims to be a detailed documentation of each rule within the module by stating its inputs, outputs (and how they relate to other rules), configurable parameters, and the software used. Moreover, when needed, there will be explanations and examples of what that particular rule does.
The schema below is a visual representation of the individual module steps and how they are related.
Third-party software used¶
Tag lines were taken from the developers' websites (code repository or manual)
| Name | License | Tag line | More info |
|---|---|---|---|
| ASCII-style alignment pileups | Apache 2.0 | "Generates ASCII-style pileups of read alignments in one or more BAM files for one or more genomic regions." | code |
| SAMtools | MIT | "[...] suite of programs for interacting with high-throughput sequencing data" | code / manual / publication |
Configuration file¶
Some parameters within the workflow can be modified. Refer to the configuration template for a detailed explanation of each option.
Pileup Workflow¶
finish_pileup¶
Target rule as required by Snakemake.
Local rule
(Workflow output) Empty text file (.txt)
create_per_library_ascii_pileups
and create_per_run_ascii_pileups
create_empty_bed¶
Create an empty BED file if the user has not provided one.
OPTIONAL RULE. This rule will be executed if, and only if, the user has not provided a BED file in the configuration file with the regions the ASCII-style alignment pileups must be performed on.
- config_template.yaml
bed_file: BED6 file with all the desired annotation regions to perform the ASCII-style alignment pileups on. (Default: None)
Empty BED file (.bed); used in
create_per_library_ascii_pileups,
create_per_run_ascii_pileups and/or
create_per_condition_ascii_pileups
compress_reference_genome¶
Compress the processed genome with trimmed IDs using bgzip with
SAMtools.
Required to perform the ASCII-style alignment pileups.
Genome sequence, trimmed IDs (.fa); from
trim_genome_seq_ids
Genome sequence, trimmed IDs, bgziped (.fa.bz); used in
create_per_library_ascii_pileups,
create_per_run_ascii_pileups and/or
create_per_condition_ascii_pileups
create_per_library_ascii_pileups¶
Create ASCII-style pileups for all the desired annotated regions across libraries with ASCII-style alignment pileups.
A directory containing the ASCII-style pileups is created for each library. If no BED file is provided, the pileups' output directories will only contain an empty file.
- Genome sequence, trimmed IDs,
bgziped (.fa.bz); from compress_reference_genome - miRNA annotations, mapped chromosome name(s) (
.gff3); from map_chr_names - (Workflow output) Alignments file, uncollapsed, sorted (
.bam); from sort_uncollapsed_reads_bam_by_position - (Workflow output) BAM index file (
.bam.bai); used in index_uncollapsed_reads_bam - Annotated genomic regions (
.bed); from workflow input files or create_empty_bed
(Workflow output) Empty text file (.txt)
create_per_run_ascii_pileups¶
Create ASCII-style pileups for all the desired annotated regions for the whole run with ASCII-style alignment pileups.
If no BED file is provided, the pileups' output directory will only contain an empty file.
- Genome sequence, trimmed IDs,
bgziped (.fa.bz); from compress_reference_genome - miRNA annotations, mapped chromosome name(s) (
.gff3); from map_chr_names - (Workflow output) Alignments file, uncollapsed, sorted (
.bam); from sort_uncollapsed_reads_bam_by_position - (Workflow output) BAM index file (
.bam.bai); used in index_uncollapsed_reads_bam - Annotated genomic regions (
.bed); from workflow input files or create_empty_bed
(Workflow output) Empty text file (.txt)
create_per_condition_ascii_pileups¶
Create ASCII-style pileups for all the desired annotated regions across the different library subsets if provided with ASCII-style alignment pileups.
OPTIONAL RULE. The ASCII-style pileups for each annotated region are made if, and only if, at least one library subset is specified in the configuration file. Otherwise, this rule will not be executed, and no output will be generated.
- Genome sequence, trimmed IDs,
bgziped (.fa.bz); from compress_reference_genome - miRNA annotations, mapped chromosome name(s) (
.gff3); from map_chr_names - (Workflow output) Alignments file, uncollapsed, sorted (
.bam); from sort_uncollapsed_reads_bam_by_position - (Workflow output) BAM index file (
.bam.bai); used in index_uncollapsed_reads_bam - Annotated genomic regions (
.bed); from workflow input files or create_empty_bed
- config_template.yaml
lib_dict: Dictionary of arbitrary condition names (keys) and library names to aggregate alignment pileups for (values; MUST correspond to names in samples table) (default: None)
Empty text file (.txt)