Initialization¶
You will need to configure ZARP-cli once to set some defaults. On this page, you will find out everything about the initialization process.
Configuring ZARP-cli¶
The following simple command triggers the ZARP-cli initialization mode:
zarp --init
An interactive screen will guide you through the process. Read on to find out more about what each of the available options and suggested defaults mean.
Where is the configuration stored?
The initialization process creates a .zarp/
directory in your home
directory and populates a configuration file with user defaults in
~/.zarp/user.yaml
.
I did not specify the --init
option - why am I in init mode?
You may have inadvertently deleted or renamed the ~/.zarp/
directory or
the ZARP-cli configuration file expected at ~/.zarp/user.yaml
. If this
file is absent or inaccessible, ZARP-cli will trigger the initialization
mode, even if it was started in normal mode.
Configuration options¶
The following configuration options are available.
Press Enter to keep the suggested default
Option | Description | Default |
---|---|---|
working_directory |
Root directory for ZARP-cli runs; needs to be writable | $HOME/.zarp |
zarp_directory |
Path to the local copy of the ZARP workflow repository | ../zarp relative to the location of the ZARP-cli repository |
execution_mode |
Trigger a full ZARP-cli run (RUN ), a dry run (DRY_RUN ; external tools are not actually run, only logs what would be run; useful for testing) or prepare a ZARP run (PREPARE_RUN ; ZARP-cli is run normally, including all external tools, up until the point of the execution of the actual ZARP workflow; use to manually check metadata table before ZARP execution) |
RUN |
cores |
Number of CPU cores that Snakemake is run with when executing ZARP and the auxiliary workflows (fetching libraries from SRA, inferring metadata) | 1 |
dependency_embedding |
Whether Snakemake should use CONDA or containers (SINGULARITY ) to manage dependencies of each workflow step/rule |
CONDA |
genome_assemblies_map |
A headerless 3-column semicolon-separated mapping table of organism/source trivial names (e.g., homo_sapiens ), optional comma-separated aliases such as NCBI taxon IDs and/or organism/source short names (e.g., 7227,dmelanogaster ) and a corresponding genome assembly name (e.g., GRCm39 ); a table in the required format is shipped with ZARP_cli in the the default location; it can be amended with additional aliases; note that for genomepy to be able to pull genome annotations for organisms/sources that HTSinfer inferred, NCBI taxon ID aliases are required |
./data/genome_assemblies.csv relative to the location of the ZARP-cli repository |
resources_version |
Whether to always download the latest available version of genome annotations for a given organism/source from Ensembl (enter None ; default) or whether to use a specific version of the corresponding Ensembl database (e.g., 100 ); note that the different Ensembl databases (e.g., for fungi, plants) use a different versioning scheme, so pinning a particular database version may lead to unexpected outcomes |
None |
rule_config |
A configuration file for the ZARP workflow that sets specific parameters for each workflow step ("rule"); see ZARP documentation for details | None |
profile |
Path to Snakemake profile to be used for the ZARP workflow; use this to optimize ZARP for your specific compute environment | |
fragment_length_distribution_mean |
HTSinfer currently is unable to infer the mean of the fragment length distribution of RNA-seq libraries; however, this value is required for tools kallisto and salmon - which are executed as part of ZARP - when run on single-ended libraries only (for paired-ended libraries, the tools are able to infer this parameter from the data); the value provided here is used as a fallback if the value was not determined experimentally (e.g., with Bioanalyzer instruments) and provided via a sample table |
300 |
fragment_length_distribution_sd |
Analogous to fragment_length_distribution_mean above, but this parameter is for the standard deviation of the fragment length distribution |
100 |
author |
Name of the person or organization executing the ZARP-cli runs; will be added to the ZARP report | None |
email |
Email of the person or organization executing the ZARP-cli runs; will be added to the ZARP report | None |
url |
URL of the person or organization executing the ZARP-cli runs; will be added to the ZARP report | None |
logo |
Logo (file path or URL) of the person or organization executing the ZARP-cli runs; will be added to the ZARP report | None |
Modifying configuration settings¶
There are two ways in which you can permanently change the default configuration settings:
- Re-run
zarp --init
Suggested defaults are now taken from the current contents of~/.zarp/user.yaml
, which will then be overridden with the values supplied during the interactive initialization mode - Edit configuration file in a text editor
Simply edit the~/.zarp/user.yaml
file in a text editor; however, make sure that only valid values are provided, as inputs are not checked
Additionally, there are ways in which you can modify configuration settings dynamically:
- Providing a custom configuration file
It is possible to specify a custom configuration file via the--config-file
CLI parameter; this could be a copy of an old/alternative~/.zarp/user.yaml
file or a subset with only some of the parameters; however, the format has to strictly follow that of the default configuration file in order for the custom configuration file contents to take effect - Setting individual CLI arguments
ZARP-cli provides a range of run-specific CLI parameters that, when specified, will override the default configuration settings for a given run - Setting sample-specific parameters in sample tables ZARP-cli's ability to process sample table allows setting of most sample- specific parameters via ZARP sample tables
Configuration setting precedence
The ability to provide configuration settings in various ways requires us to resolve conflicting settings in a predictable and user-friendly manner. In ZARP-cli configuration settings are applied iteratively, with values sourced from a current iteration overriding those from any previous iterations. The following configuration sources are applied successively:
- Defaults hardwired in the code (lowest precedence!)
- Contents of default configuration file at
~/.zarp/user.yaml
- Contents of custom configuration file supplied via
--config-file
, if provided - CLI arguments for individual run- and sample-specific parameters, if provided
- Sample-specific parameters specified in sample tables (highest precedence!)