module primir_quantification.py¶
Tabulate bedtools intersect -wo -s output file.
For each intersecting feature from an INTERSECT file, calculate the sum of its contributions and print the result in tab-delimited format.
Contribution logic (controlled by --collapsed and --nh):
- No flags: contribution = 1
- Only
--nh: contribution = 1 / NH - Only
--collapsed: contribution = #reads / 1 - Both
--collapsedand--nh: contribution = #reads / NH
The number of collapsed reads (#reads) and the NH value are inferred from the
read name, which must follow one of these formats:
- plain: READ
- NH only: READ_NH
- collapsed only: READ-#reads
- collapsed + NH: READ-#reads_NH
EXPECTED INPUT FILE
The expected INTERSECT file must be the output of:
bedtools intersect -wo -s -a GFF3/GTF -b BAM.
The first 10 lines are validated for format. If the INTERSECT file is empty, no
output is produced.
OUTPUT TABLE FORMAT
The basic output table has no header and two columns:
1) Feature identifier (taken from the attribute named by --id,
which must match a key in the attributes column of the original
GFF3/GTF features used in the bedtools call)
2) Feature contribution (float)
Optional columns (controlled by flags):
--read-idsappends a semicolon-separated list of read IDs that overlap the feature (always added as the last column).--feat-extensionappends two columns with 5′ and 3′ extension sizes. These are inferred by splitting the FEATURE-ID value (selected via--id) on underscores, assuming the convention:NAME_5EXT_3EXT. If that pattern isn't present in the selected attribute, the extension columns will be 0.
Examples¶
Example 1 | Contribution when using --collapsed¶
Use case:
A single feature with several intersecting reads. The flag --collapsed is
used, so contribution equals the # of reads per alignment.
IN INTERSECT records:
19 . miRNA_primary_transcript 44362 44448 . + . ID=MI0003160;Alias=MI0003160;Name=hsa-mir-524_-0_+0 19 44413 44434 8-2 255 + 21
19 . miRNA_primary_transcript 44362 44448 . + . ID=MI0003160;Alias=MI0003160;Name=hsa-mir-524_-0_+0 19 44413 44434 24-1 255 + 21
Alignments:
Read ID: 8-2 Number of collapsed reads: 2 Contribution: 2
Read ID: 24-1 Number of collapsed reads: 1 Contribution: 1
OUT table:
hsa-mir-524_-0_+0 3
Example 2 | Contribution when using --nh¶
Use case:
A single feature with several intersecting reads. The flag --nh is used, so
contribution equals 1/NH.
IN INTERSECT records:
19 . miRNA_primary_transcript 44362 44448 . + . ID=MI0003160;Alias=MI0003160;Name=hsa-mir-524_-0_+0 19 44413 44434 8_1 255 + 21
19 . miRNA_primary_transcript 44362 44448 . + . ID=MI0003160;Alias=MI0003160;Name=hsa-mir-524_-0_+0 19 44413 44434 24_1 255 + 21
Alignments:
Read ID: 8_1 Number of mapped genomic loci: 1 Contribution: 1
Read ID: 24_1 Number of mapped genomic loci: 1 Contribution: 1
OUT table:
hsa-mir-524_-0_+0 2
Example 3 | Contribution when using --collapsed and --nh¶
Use case:
A single feature with several intersecting reads. The flags --nh and
--contribution are used, so contribution equals # of reads/NH.
IN INTERSECT records:
19 . miRNA_primary_transcript 44362 44448 . + . ID=MI0003160;Alias=MI0003160;Name=hsa-mir-524_-0_+0 19 44413 44434 8-2_1 255 + 21
19 . miRNA_primary_transcript 44362 44448 . + . ID=MI0003160;Alias=MI0003160;Name=hsa-mir-524_-0_+0 19 44413 44434 24-1_1 255 + 21
Alignments:
Read ID: 8-2_1 Number of collapsed reads: 2 Number of mapped genomic loci: 1 Contribution: 2/1 = 2
Read ID: 23-1_1 Number of collapsed reads: 1 Number of mapped genomic loci: 1 Contribution: 1/1 = 1
OUT table:
hsa-mir-524_-0_+0 3
Example 4 | Column with intersecting reads; using --read-ids¶
Use case:
A single feature with several intersecting reads. Each read contributes with 1. Read IDs intersecting the feature are appended as the last column.
IN INTERSECT records:
19 . miRNA_primary_transcript 44362 44448 . + . ID=MI0003160;Alias=MI0003160;Name=hsa-mir-524_-0_+0 19 44413 44434 8-2_1 255 + 21
19 . miRNA_primary_transcript 44362 44448 . + . ID=MI0003160;Alias=MI0003160;Name=hsa-mir-524_-0_+0 19 44413 44434 24-1_1 255 + 21
Alignments:
Read ID: 8-2_1 Contribution: 1
Read ID: 24-1_1 Contribution: 1
OUT table:
hsa-mir-524_-0_+0 2 8-2_1;24-1_1
Example 5 | Columns with feature shifts; using --feat-extension¶
Use case:
A single feature with several intersecting reads. Each read contributes with 1. Feature start and end coordinates shift are appended as the third and fourth column respectively.
IN INTERSECT records:
19 . miRNA_primary_transcript 44362 44448 . + . ID=MI0003160;Alias=MI0003160;Name=hsa-mir-524_-1_+3 19 44413 44434 8-2_1 255 + 21
19 . miRNA_primary_transcript 44362 44448 . + . ID=MI0003160;Alias=MI0003160;Name=hsa-mir-524_-1_+3 19 44413 44434 24-1_1 255 + 21
Alignments:
Read ID: 8-2_1 Contribution: 1
Read ID: 24-1_1 Contribution: 1
Feature:
Feature name: hsa-mir-524_-1_+3
5' shift: -1
3' shift: +3
OUT table:
hsa-mir-524_-1_+3 2 -1 +3
Global Variables¶
- TYPE_CHECKING
function parse_arguments¶
parse_arguments()
Command-line arguments parser.
function get_contribution¶
get_contribution(
query_id: str,
collapsed: bool = False,
nh: bool = False
) → float
Return the contribution of a single alignment.
The read name format depends on the flags:
- collapsed + NH:
READ-#reads_NH - collapsed only:
READ-#reads - NH only:
READ_NH - neither:
READ
Arguments:
query_id: Read/query name from the INTERSECT record.collapsed: IfTrue, parse #reads from the read name.nh: IfTrue, parse NH from the read name.
Returns: Contribution as (#reads / NH) following the rules above.
function get_initial_data¶
get_initial_data(name: str, feat_extension: bool) → list[str]
Get the feature name and its extension.
Arguments:
name: string with the feature name that can or not include its annotation extension in the formatname_5-extension_3-extensionfeat_extension: specify whether the feature annotation extension has to be a field on the final output
Returns: list with the feature name to be found in the final table and the number of extended positions (if asked for)
function main¶
main(args) → None
Tabulate bedtools intersect -wo -s output file.