Usage¶
How does the ZARP-cli work?¶
The zarp
command accepts two kinds of arguments:
- Positional arguments: All positional arguments are interpreted as sample references, which can be paths to local RNA-Seq library files, paths to sample tables, or "run identifiers" assigned by either the Sequence Read Archive (SRA), the DNA Data Bank of Japan (DDBJ), or the European Nucleotide Archive (ENA). See below for a detailed description of the sample reference syntax.
- Command-line options: Optional arguments of the form
--optional-arg
, which either modify ZARP-cli's behavior or assign sample-, run- or user- specific metadata globally to all samples of a given ZARP-cli run. See below for a more detailed description of command-line options.
Sample references¶
The table below gives an overview of the supported basic sample reference types:
Type | Note | Examples |
---|---|---|
Path to local RNA-Seq library | Both absolute and relative paths are supported | /path/to/library_1.fq.gz , library_2.fq.gz |
Path to local ZARP sample table | Both absolute and relative paths are supported | table:/path/to/sample/table_1.tsv , table:table_2.tsv |
SRA/DDBJ/ENA identifier | Valid identifiers need to be matched by the following regular expression: (E|D|S)RR[0-9]{6,} |
SRR123456 , DRR7654321 |
The above basic types can further be amended by the following syntax fragments for further annotation:
Syntax | Description | Examples |
---|---|---|
PATH,PATH |
Exactly two paths, separated by a comma and no white space, signify the two separate files for a paired-ended sequencing library; absolute and relative paths and mixes thereof are supported | /tmp/m1.fq.gz,/tmp/m2.fq.gz , mate_1.fq.gz,mate_2.fq.gz , mate_1.fq.gz,/tmp/m2.fq.gz |
NAME@REF |
A string separated from a non-table sample reference via the @ specifies a sample name (if not provided, a sanitized form of the base name of the file path is used) | se_sample@lib.fq.gz , pe_sample@m1.fq.gz,m2.fq.gz , remote_sample@ERR11223344 |
Different sample references can of course be mixed and matched to your heart's content!
Command-line options¶
Available command-line parameters are grouped into the following sections:
Section | Description |
---|---|
General | Next to sample references (the only required parameters!), these currently include the verbosity level and an option to provide a custom configuration file |
Run modes | These parameters execute ZARP-cli in special modes, e.g., for initialization or to display the help screen |
Sample-specific | These parameters modify globally set metadata for all samples of a run, unless overridden inside provided sample tables |
Run-specific | These parameters modify the behavior of ZARP-cli or set metadata to describe runs |
User-specific | These parameters will be included in the ZARP report, if available |
A complete listing of all available CLI options can easily be printed to the screen, together with detailed descriptions, with the following command and will therefore not be repeated here:
zarp --help
Using the API¶
Next to using ZARP-cli's eponymous command-line interface, you can also integrate ZARP-cli's functionalities into your Python projects via its API.
The main entry point for ZARP-cli's high-level functionalities is the
zarp.zarp.ZARP
class.
A basic code snippet to trigger ZARP runs in your code might look like this:
from zarp.zarp import ZARP
# set up ZARP-cli configuration and attach sample references (not shown)
zarp = ZARP(config=config)
zarp.set_up_run()
samples = zarp.process_samples()
zarp.execute_run(samples=samples)
Configuring zarp.zarp.ZARP
To configure zarp.zarp.ZARP
, have a look at the
zarp.config.parser.ConfigParser
class and the
zarp.config.models.Config
model.
A reference to the entire ZARP-cli API is provided in the API overview.