Workflow: SoupX (workflow) - an R package for the estimation and removal of cell free mRNA contamination

Fetched 2023-01-04 16:35:31 GMT

Wrapped in a workflow SoupX tool for easy access to Cell Ranger pipeline compressed outputs.

children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
fdr Float (Optional)

FDR cutoff for expression ratio plots

round_counts Boolean (Optional)

Round adjusted counts to integers

genelist_file File (Optional)

Target genes list. Headerless text file with 1 gene per line

output_prefix String (Optional)

Output prefix

expression_threshold Float (Optional)

Expression threshold for displaying target genes on a plot (expression > threshold)

matrix_format_version https://w3id.org/cwl/view/git/cbefc215d8286447620664fb47076ba5d81aa47f/tools/soupx-subworkflow.cwl#matrix_format_version/matrix_format_version (Optional)

Output matrix format version. Corresponds to the latest Cell Ranger matrix format

raw_feature_bc_matrices_folder File

Compressed folder with unfiltered feature-barcode matrices

secondary_analysis_report_folder File

Compressed folder with secondary analysis results

filtered_feature_bc_matrix_folder File

Compressed folder with filtered feature-barcode matrices

Steps

ID Runs Label Doc
estimate_contamination
soupx.cwl (CommandLineTool)
SoupX - an R package for the estimation and removal of cell free mRNA contamination

In droplet based, single cell RNA-seq experiments, there is always a certain amount of background mRNAs present in the dilution that gets distributed into the droplets with cells and sequenced along with them. The net effect of this is to produce a background contamination that represents expression not from the cell contained within a droplet, but the solution that contained the cells.

This collection of cell free mRNAs floating in the input solution (henceforth referred to as “the soup”) is created from cells in the input solution being lysed. Because of this, the soup looks different for each input solution and strongly resembles the expression pattern obtained by summing all the individual cells.

The aim of this package is to provide a way to estimate the composition of this soup, what fraction of UMIs are derived from the soup in each droplet and produce a corrected count table with the soup based expression removed.

The method to do this consists of three parts:

- Calculate the profile of the soup. - Estimate the cell specific contamination fraction. - Infer a corrected expression matrix.

extract_count_matrices_to_folder
soupx-subworkflow.cwl#extract_count_matrices_to_folder/61d3a72d-2fba-4aae-a957-84ae7bb49bc5 (CommandLineTool)
compress_adjusted_feature_bc_matrices_folder
tar-compress.cwl (CommandLineTool)

Compresses input directory to tar.gz

Outputs

ID Type Label Doc
soupx_stderr_log File

SoupX stderr log

soupx_stdout_log File

SoupX stdout log

raw_gene_expression_plots File (Optional)

Raw gene expression plots

contamination_estimation_plot File

Contamination estimation plot

adjusted_gene_expression_plots File (Optional)

Adjusted gene expression plots

adjusted_feature_bc_matrices_h5 File

Adjusted feature-barcode matrices in HDF5 format

adjusted_feature_bc_matrices_folder File

Compressed folder with adjusted feature-barcode matrices in MEX format

raw_to_adjusted_gene_expression_ratio_plots File (Optional)

Raw to adjusted gene expression ratio plots

raw_gene_expression_to_pure_soup_ratio_plots File (Optional)

Raw gene expression to pure soup ratio plots

Permalink: https://w3id.org/cwl/view/git/cbefc215d8286447620664fb47076ba5d81aa47f/tools/soupx-subworkflow.cwl