script merge_tables.R¶
Merge miRNAs quantification tables.
Usage¶
Rscript merge_tables.R [--help] [--verbose] [OPTIONS] --input_dir <PATH>
Arguments¶
--input_dir=DIRECTORY(required): Absolute path from where input files shall be read.
Options¶
--output_file=FILE: Path to the output file
(default: working-directory/counts.tab).
- --prefix: Prefix for reading input files (default: NULL).
- -h | --help: Show this information and die.
- -u | --usage: Show this information and die.
- -v | --verbose: Print log messages to STDOUT.
Dependencies¶
- R version:
>= 3.6.0 - R packages:
optparse:>= 1.6.2dplyr:>= 1.1.4
function get_table¶
get_table <- function(tbl_pth, prefix)
Read and process input table
get_table() uses tryCatch() to read the file in tbl_pth. If the table
is empty and an error is raised, the returned data frame consist of one row
with a NA in both fields.
Arguments:
tbl_pth: Path to the input table.prefix: String to be removed from the input file name. It must be present in all the tables to be merged.
Returns:
A data frame containing the miRNA species to be counted in first column, named
ID, and their counts in that file in the second one. The name of the second
column in the data frame is obtained by removing the prefix from the input
file name. If no prefix is given, the whole file name is used.
function merge_tables¶
merge_tables <- function(cwd, prefix)
Merge tables with the same prefix
merge_tables() takes all the files in cwd that start with prefix and
merges them keeping all the miRNA species present in each of the tables.
The function get_table() is used to make sure that even if an empty input
file is given, the merge can still be done by creating a data frame with a
single row made of MAs. Therefore, prior to the returning of the merged table,
if there is a row with a NA in the ID filed, it is removed.
The function dplyr::full_join() is used for the merge. This implies that if
a miRNA species in ID is missing in any of the tables being joined, its value
is set to NA in that column.
Arguments:
cwd: Path to the directory containing the input tables.prefix: String used in all the tables to be selected for the merge. If not provided, all the files incwdare used.
Returns:
A single data frame, mat, with all the miRNA species present in the input
tables in the first column, ID, and their counts. Each input file has it own
column.
If all the input tables are empty, the output only consist of the table's
header, and if no files starting with prefix are found, nothing is returned.