Skip to content

module mirna_quantification.py

Quantify miRNAs and corresponding isomiRs.

Read the input SAM file, calculate the contribution sum of the intersecting alignments for each feature and write the result in tab-delimited format. The following sections will cover what the SAM file must be like, how the counting is done, and how the output can look like. The final section contain examples covering the main modes.

EXPECTED INPUT

The SAM file must contain only the alignments that intersect with canonical miRNAs and/or isomiRs. For each alignment, the intersecting feature name(s) must be stored in a tag. If the tag determined by the CLI argument --tag is empty, the alignment is ignored. In addition, the SAM file must be sorted by the tag storing the feature names. Optionally, the SAM file can contain collapsed reads, as, e.g., produced by the fastx_collapser tool of the FASTX-Toolkit. Finally, read names can also contain the alignment NH vale (see READ NAME FORMAT section).

FEATURE NAME FORMAT

The name of the intersecting feature(s) has to follow the format:

feat_name|5p-shift|3p-shift|CIGAR|MD|READ_SEQ
  • For isomiRs, the whole name is used
  • For canonical miRNAs, the feat_name is used

A feature is classified as canonical if the 5p and 3p shifts are both 0 and the CIGAR and MD strings are the same. If there are different features names separated by a semi-colon, they are treated as a unique isomiR name. For example, the feature name feat_name_1|0|0|23M|23|read_seq would represent a canonical miRNA and the feature names feat_name_2|0|0|22M|0A0A2T17|read_seq and feat_name_3|0|0|22M|22|read_seq;feat_name_4|1|0|23M|17A3T0C0|read_seq would represent isomiRs.

READ NAME FORMAT

The read name must have one of the following formats:

  • NAME
  • NAME_NH
  • NAME-COUNT
  • NAME-COUNT_NH

where:

  • NAME is an arbitrary unique name
  • COUNT is the number of identical reads that were collapsed
  • NH is the alignment NH tag.

For example, the query name my_query_name-13_2 would indicate that the read represented by this alignment occurred 13 times before collapsing and the alignment's NH value is 2.

Only if COUNT is in the name the CLI flag --collapsed can be set. Likewise, the CLI flag --nh can only be set if NH is in the name.

COUNT TABULATION

The count for each feature is the sum of all the intersecting alignment's contribution. The appropriate contribution is the ratio between the number of reads and the alignment's NH.

The CLI flags --collapsed and --nh determine if these values are taken from the alignment's name:

  • If --collapsed is not set, the number of reads is considered to be 1
  • If --nh is not set, the NH value is obtained from the NH tag
  • If the NH tag is missing, its value is considered to be 1

TABULATED FEATURES

Which kind of features will appear in the output file is determined by the CLI argument --mir-list. The three possible options are:

  1. mirna to only include canonical miRNAs
  2. isomir to only include isomiRs
  3. mirna and isomir to include both canonical miRNAs and isomiRs

OUTPUT FORMAT

The output is tab-delimited file named mir_counts_LIB, where LIB is the read's library name set in the CLI option --lib.

If the SAM file is empty, an empty file is produced.

By default each row contains the feature name and its partial count.

Three extra columns can be added by using the CLI flags --count, --len and --read-ids:

  • If --count is set, the output table will contain the number of best alignments for each feature
  • If --len is set, the output table will contain the read's length
  • If both --count and --len are set, the count will always be followed by the read's length
  • If --read-ids is set, a semicolon-separated list of all alignment IDs that overlap with the feature will always be in the last column

Examples

Example 1 | Feature name stored in the tag XN

Input:

SAM with intersecting feature names in stored in the tag XN

Command:

mirna_quantification.py SAM --tag XN

Output:

hsa-miR-516b-5p                                     3.0
hsa-miR-517-5p|-1|0|23M|22T|ACCTCTAGATGGAAGCACTGTCG 0.6000000000000001

Example 2 | Quantify mature miRNA

Input:

SAM meeting the minimal characteristics

Command:

mirna_quantification.py SAM --mir-list mirna

Output:

hsa-miR-517b-3p     1.3333333333333333
hsa-miR-518b        1.0

Example 3 | Quantify isomiR

Input:

SAM meeting the minimal characteristics

Command:

mirna_quantification.py SAM --mir-list isomir

Output:

hsa-miR-512-3p|0|1|23M|22C|AAGTGCTGTCATAGCTGAGGTAA  1.6666666666666665
hsa-miR-517-5p|-1|0|23M|22T|ACCTCTAGATGGAAGCACTGTCG 0.6000000000000001

Example 4 | Collapsed reads with NH tag in the name

Input:

SAM with read names following the format NAME-COUNT_NH

Command:

mirna_quantification.py SAM --collpased --nh

Output:

hsa-miR-516b-5p                                     3.0
hsa-miR-517-5p|-1|0|23M|22T|ACCTCTAGATGGAAGCACTGTCG 0.6000000000000001

Example 5 | Collapsed reads

Input:

SAM with read names following the format NAME-COUNT

Command:

mirna_quantification.py SAM --collapsed

Output:

hsa-miR-516b-5p                                     3.0
hsa-miR-517-5p|-1|0|23M|22T|ACCTCTAGATGGAAGCACTGTCG 0.6000000000000001

Example 6 | Reads with the NH tag in their name

Input:

SAM with read names following the format NAME_NH

Command:

mirna_quantification.py SAM --nh

Output:

hsa-miR-516b-5p                                     3.0
hsa-miR-517-5p|-1|0|23M|22T|ACCTCTAGATGGAAGCACTGTCG 0.6000000000000001

Example 7 | Quantify mature miRNAs and add read IDs to the final table

Input:

SAM with read names following the format NAME_NH

Command:

mirna_quantification.py SAM --read-ids --nh --mir-list mirna

Output:

hsa-miR-498-5p      4.333333333333333   270396_3
hsa-miR-1323        12.0                673650_2;673650_2

Example 8 | Quantify isomiRs, add isomiR length and absolute counts

Input:

SAM meeting the minimal characteristics

Command:

mirna_quantification.py SAM --count --len --mir-list isomir

Output:

hsa-miR-512-3p|0|1|23M|22C0|AAGTGCTGTCATAGCTGAGGTAA     4.333333333333333  6 23
hsa-miR-512-3p|0|1|23M|3T18C0|AAGGGCTGTCATAGCTGAGGTAA   12.0               8 19

function parse_arguments

parse_arguments()

Command-line arguments parser.


function collapsed_nh_contribution

collapsed_nh_contribution(aln: AlignedSegment)  float

Get the contribution of the alignment to the overall count.

The contribution is computed as the ratio of the number of reads collapsed in the alignment and the NH value. It is assumed that the alignment query name contains the number of collapsed reads as well as the NH value in the format NAME-COUNT_NH.

Arguments:

  • aln: Alignment to which the overall contribution is calculated

Returns: Contribution of alignment to overall count


function collapsed_contribution

collapsed_contribution(aln: AlignedSegment)  float

Get the contribution of the alignment to the overall count.

The contribution is computed as the ratio of the number of reads collapsed in the alignment and the value stored in the NH tag. If the tag is missing, the NH value is 1. It is assumed that the alignment query name contains the number of collapsed reads in the format NAME-COUNT.

Arguments:

  • aln: Alignment to which the overall contribution is calculated

Returns: Contribution of alignment to overall count


function nh_contribution

nh_contribution(aln: AlignedSegment)  float

Get the contribution of the alignment to the overall count.

The contribution is computed as the ratio of the number of reads collapsed in the alignment and the value stored in the NH tag. If the tag is missing, the NH value is 1. It is assumed that the alignment query name contains the NH value in the format NAME_NH.

Arguments:

  • aln: Alignment to which the overall contribution is calculated

Returns: Contribution of alignment to overall count


function contribution

contribution(aln: AlignedSegment)  float

Get the contribution of the alignment to the overall count.

The contribution is computed as the ratio of the number of reads collapsed in the alignment and the value stored in the NH tag. If the tag is missing, the overall contribution is 1.

Argumens:

  • aln: Alignment to which the overall contribution is calculated

Returns: Contribution of alignment to overall count


function get_name

get_name(pre_name: str)  list[str]

Get the final name for the species name.

Take a string and processes it to obtain the final name for the species and the type of miRNA the string belongs to. Only the feat_name is returned if the 3p-shift and 5p-shift are 0 and the CIGAR and MD are the same. Otherwise, the whole input string is returned.

Arguments:

  • pre_name: string with the format feat_name|5-shift|3-shift|CIGAR|MD|read_seq

Returns: list with the species name to be found in the final table and its type


function write_output

write_output(
    name: str,
    species: list[str],
    mir_list: list[str],
    mirna_out: Path
)  None

Write to the output the correct miRNA type.

Arguments:

  • name: miRNA species type
  • species: List of candidate miRNA species to write in the output file
  • mir_list: List of miRNA types to have in the output table
  • mirna_out: Path to the output file

function main

main(args)  None

Quantify miRNAs and corresponding isomiRs.