Workflow: Cellranger aggr - aggregates data from multiple Cellranger runs

Fetched 2023-01-10 07:43:06 GMT

Devel version of Single-Cell Cell Ranger Aggregate ================================================== Workflow calls \"cellranger aggr\" command to combine output files from \"cellranger count\" (the molecule_info.h5 file from each run) into a single feature-barcode matrix containing all the data. When combining multiple GEM wells, the barcode sequences for each channel are distinguished by a GEM well suffix appended to the barcode sequence. Each GEM well is a physically distinct set of GEM partitions, but draws barcode sequences randomly from the pool of valid barcodes, known as the barcode whitelist. To keep the barcodes unique when aggregating multiple libraries, we append a small integer identifying the GEM well to the barcode nucleotide sequence, and use that nucleotide sequence plus ID as the unique identifier in the feature-barcode matrix. For example, AGACCATTGAGACTTA-1 and AGACCATTGAGACTTA-2 are distinct cell barcodes from different GEM wells, despite having the same barcode nucleotide sequence. This number, which tells us which GEM well this barcode sequence came from, is called the GEM well suffix. The numbering of the GEM wells will reflect the order that the GEM wells were provided in the \"molecule_info_h5\" and \"gem_well_labels\" inputs. When combining data from multiple GEM wells, the \"cellranger aggr\" pipeline automatically equalizes the average read depth per cell between groups before merging. This approach avoids artifacts that may be introduced due to differences in sequencing depth. It is possible to turn off normalization or change the way normalization is done through the \"normalization_mode\" input. The \"none\" value may be appropriate if you want to maximize sensitivity and plan to deal with depth normalization in a downstream step.

children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
alias String Experiment short name/Alias
threads Integer (Optional) Number of threads

Number of threads for those steps that support multithreading

memory_limit Integer (Optional) Maximum memory used (GB)

Maximum memory used (GB). The same will be applied to virtual memory

gem_well_labels String[] scRNA-Seq Cell Ranger Experiment

Array of GEM well identifiers to be used for labeling purposes only

molecule_info_h5 File[] scRNA-Seq Cell Ranger Experiment

Molecule-level information from individual runs of cellranger count

normalization_mode Library depth normalization mode

Library depth normalization mode

Steps

ID Runs Label Doc
aggregate_counts
../tools/cellranger-aggr.cwl (CommandLineTool)
Cellranger aggr - aggregates data from multiple Cellranger runs

Tool calls \"cellranger aggr\" command to combine output files from \"cellranger count\" (the molecule_info.h5 file from each run) into a single feature-barcode matrix containing all the data. When combining multiple GEM wells, the barcode sequences for each channel are distinguished by a GEM well suffix appended to the barcode sequence. Each GEM well is a physically distinct set of GEM partitions, but draws barcode sequences randomly from the pool of valid barcodes, known as the barcode whitelist. To keep the barcodes unique when aggregating multiple libraries, we append a small integer identifying the GEM well to the barcode nucleotide sequence, and use that nucleotide sequence plus ID as the unique identifier in the feature-barcode matrix. For example, AGACCATTGAGACTTA-1 and AGACCATTGAGACTTA-2 are distinct cell barcodes from different GEM wells, despite having the same barcode nucleotide sequence. This number, which tells us which GEM well this barcode sequence came from, is called the GEM well suffix. The numbering of the GEM wells will reflect the order that the GEM wells were provided in the \"molecule_info_h5\" and \"gem_well_labels\" inputs.

When combining data from multiple GEM wells, the \"cellranger aggr\" pipeline automatically equalizes the average read depth per cell between groups before merging. This approach avoids artifacts that may be introduced due to differences in sequencing depth. It is possible to turn off normalization or change the way normalization is done through the \"normalization_mode\" input. The \"none\" value may be appropriate if you want to maximize sensitivity and plan to deal with depth normalization in a downstream step.

Parameters set by default: --disable-ui - no need in any UI when running in Docker container --id - hardcoded to `aggregated` as we want to return the content of the outputs folder as separate outputs

Skipped parameters: --nosecondary --dry --noexit --nopreflight --description --jobmode --mempercore --maxjobs --jobinterval --overrides --uiport

Not supported features: - Batch correction caused by different versions of the Single Cell Gene Expression chemistry is not supported as the generated metadata file doesn't include \"batch\" field.

compress_raw_feature_bc_matrices_folder
../tools/tar-compress.cwl (CommandLineTool)

Compresses input directory to tar.gz

compress_secondary_analysis_report_folder
../tools/tar-compress.cwl (CommandLineTool)

Compresses input directory to tar.gz

compress_filtered_feature_bc_matrix_folder
../tools/tar-compress.cwl (CommandLineTool)

Compresses input directory to tar.gz

Outputs

ID Type Label Doc
web_summary_report File Aggregated run summary metrics and charts in HTML format

Aggregated run summary metrics and charts in HTML format

loupe_browser_track File Loupe Browser visualization and analysis file for aggregated results

Loupe Browser visualization and analysis file for aggregated results

aggregation_metadata File Aggregation metadata in CSV format

Aggregation metadata in CSV format

raw_feature_bc_matrices_h5 File Aggregated unfiltered feature-barcode matrices in HDF5 format

Aggregated unfiltered feature-barcode matrices containing all barcodes in HDF5 format

aggregate_counts_stderr_log File stderr log generated by cellranger aggr

stderr log generated by cellranger aggr

aggregate_counts_stdout_log File stdout log generated by cellranger aggr

stdout log generated by cellranger aggr

metrics_summary_report_json File Aggregated run summary metrics in JSON format

Aggregated run summary metrics in JSON format

filtered_feature_bc_matrix_h5 File Aggregated filtered feature-barcode matrices in HDF5 format

Aggregated filtered feature-barcode matrices containing only cellular barcodes in HDF5 format

raw_feature_bc_matrices_folder File Compressed folder with aggregated unfiltered feature-barcode matrices

Compressed folder with aggregated unfiltered feature-barcode matrices containing all barcodes in MEX format

secondary_analysis_report_folder File Compressed folder with aggregated secondary analysis results

Compressed folder with secondary analysis results including dimensionality reduction, cell clustering, and differential expression of aggregated results

filtered_feature_bc_matrix_folder File Compressed folder with aggregated filtered feature-barcode matrices

Compressed folder with aggregated filtered feature-barcode matrices containing only cellular barcodes in MEX format

Permalink: https://w3id.org/cwl/view/git/09267e79fd867aa68a219c69e6db7d8e2e877be2/workflows/cellranger-aggr.cwl