Skip to content

script merge_tables.R

Merge miRNAs quantification tables.

Usage

Rscript merge_tables.R [--help] [--verbose] [OPTIONS] --input_dir <PATH>

Arguments

  • --input_dir=DIRECTORY (required): Absolute path from where input files shall be read.

Options

  • --output_file=FILE: Path to the output file

(default: working-directory/counts.tab). - --prefix: Prefix for reading input files (default: NULL). - -h | --help: Show this information and die. - -u | --usage: Show this information and die. - -v | --verbose: Print log messages to STDOUT.

Dependencies

  • R version: >= 3.6.0
  • R packages:
    • optparse: >= 1.6.2
    • dplyr: >= 1.1.4

function get_table

get_table <- function(tbl_pth, prefix)

Read and process input table

get_table() uses tryCatch() to read the file in tbl_pth. If the table is empty and an error is raised, the returned data frame consist of one row with a NA in both fields.

Arguments:

  • tbl_pth: Path to the input table.
  • prefix: String to be removed from the input file name. It must be present in all the tables to be merged.

Returns:

A data frame containing the miRNA species to be counted in first column, named ID, and their counts in that file in the second one. The name of the second column in the data frame is obtained by removing the prefix from the input file name. If no prefix is given, the whole file name is used.


function merge_tables

merge_tables <- function(cwd, prefix)

Merge tables with the same prefix

merge_tables() takes all the files in cwd that start with prefix and merges them keeping all the miRNA species present in each of the tables.

The function get_table() is used to make sure that even if an empty input file is given, the merge can still be done by creating a data frame with a single row made of MAs. Therefore, prior to the returning of the merged table, if there is a row with a NA in the ID filed, it is removed.

The function dplyr::full_join() is used for the merge. This implies that if a miRNA species in ID is missing in any of the tables being joined, its value is set to NA in that column.

Arguments:

  • cwd: Path to the directory containing the input tables.
  • prefix: String used in all the tables to be selected for the merge. If not provided, all the files in cwd are used.

Returns:

A single data frame, mat, with all the miRNA species present in the input tables in the first column, ID, and their counts. Each input file has it own column.

If all the input tables are empty, the output only consist of the table's header, and if no files starting with prefix are found, nothing is returned.