Workflow: Cell Ranger ARC Count Gene Expression + ATAC

Fetched 2023-01-04 15:19:52 GMT

Cell Ranger ARC Count Gene Expression + ATAC ============================================

children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
alias String Experiment short name/Alias
threads Integer (Optional) Number of threads

Number of threads for those steps that support multithreading

memory_limit Integer (Optional) Genome Type

Maximum memory used (GB). The same as was used for generating indices. The same will be applied to virtual memory

indices_folder Directory Genome Type

Cell Ranger ARC generated genome indices folder

exclude_introns Boolean (Optional) Disable counting of intronic reads

Disable counting of intronic reads. In this mode, only reads that are exonic and compatible with annotated splice junctions in the reference are counted. Note: using this mode will reduce the UMI counts in the feature-barcode matrix

gex_fastq_file_r1 File [FASTQ] GEX FASTQ file R1 (optionally compressed)

GEX FASTQ file R1 (optionally compressed)

gex_fastq_file_r2 File [FASTQ] GEX FASTQ file R2 (optionally compressed)

GEX FASTQ file R2 (optionally compressed)

atac_fastq_file_r1 File [FASTQ] ATAC FASTQ file R1 (optionally compressed)

ATAC FASTQ file R1 (optionally compressed)

atac_fastq_file_r2 File [FASTQ] ATAC FASTQ file R2 (optionally compressed)

ATAC FASTQ file R2 (optionally compressed)

atac_fastq_file_r3 File [FASTQ] ATAC FASTQ file R3 (optionally compressed)

ATAC FASTQ file R3 (optionally compressed)

Steps

ID Runs Label Doc
collect_statistics
cellranger-arc-count.cwl#collect_statistics/717576b5-5e4a-413c-a7d5-8ac0e05c9dbe (CommandLineTool)
extract_gex_fastq_r1
../tools/extract-fastq.cwl (CommandLineTool)

Tool to decompress input FASTQ file(s). If several FASTQ files are provided, they will be concatenated in the order that corresponds to files in input. Bash script's logic: - disable case sensitive glob check - check if root name of input file already include '.fastq' or '.fq' extension. If yes, set DEFAULT_EXT to \"\", otherwise use '.fastq' - check file type, decompress if needed - return 1, if file type is not recognized This script also works of input file doesn't have any extension at all

extract_gex_fastq_r2
../tools/extract-fastq.cwl (CommandLineTool)

Tool to decompress input FASTQ file(s). If several FASTQ files are provided, they will be concatenated in the order that corresponds to files in input. Bash script's logic: - disable case sensitive glob check - check if root name of input file already include '.fastq' or '.fq' extension. If yes, set DEFAULT_EXT to \"\", otherwise use '.fastq' - check file type, decompress if needed - return 1, if file type is not recognized This script also works of input file doesn't have any extension at all

extract_atac_fastq_r1
../tools/extract-fastq.cwl (CommandLineTool)

Tool to decompress input FASTQ file(s). If several FASTQ files are provided, they will be concatenated in the order that corresponds to files in input. Bash script's logic: - disable case sensitive glob check - check if root name of input file already include '.fastq' or '.fq' extension. If yes, set DEFAULT_EXT to \"\", otherwise use '.fastq' - check file type, decompress if needed - return 1, if file type is not recognized This script also works of input file doesn't have any extension at all

extract_atac_fastq_r2
../tools/extract-fastq.cwl (CommandLineTool)

Tool to decompress input FASTQ file(s). If several FASTQ files are provided, they will be concatenated in the order that corresponds to files in input. Bash script's logic: - disable case sensitive glob check - check if root name of input file already include '.fastq' or '.fq' extension. If yes, set DEFAULT_EXT to \"\", otherwise use '.fastq' - check file type, decompress if needed - return 1, if file type is not recognized This script also works of input file doesn't have any extension at all

extract_atac_fastq_r3
../tools/extract-fastq.cwl (CommandLineTool)

Tool to decompress input FASTQ file(s). If several FASTQ files are provided, they will be concatenated in the order that corresponds to files in input. Bash script's logic: - disable case sensitive glob check - check if root name of input file already include '.fastq' or '.fq' extension. If yes, set DEFAULT_EXT to \"\", otherwise use '.fastq' - check file type, decompress if needed - return 1, if file type is not recognized This script also works of input file doesn't have any extension at all

generate_counts_matrix
../tools/cellranger-arc-count.cwl (CommandLineTool)
Cell Ranger ARC count - generates single cell feature counts for a single multiome library

Count ATAC and gene expression reads from a single library.

Cell Ranger ARC count performs alignment, filtering, barcode counting, peak calling and counting of both ATAC and GEX molecules. Furthermore, it uses the Chromium cellular barcodes to generate feature-barcode matrices, perform dimensionality reduction, determine clusters, perform differential analysis on clusters and identify linkages between peaks and genes. The count pipeline can take input from multiple sequencing runs on the same GEM well.

Parameters set by default: --disable-ui - no need in any UI when running in Docker container --id - hardcoded to `sample` to simplify output files location --libraries - points to the file libraries.csv generated based on the input FASTQ files

No implemented parameters: --no-bam - we want to always generate BAM files --dry --noexit --nopreflight --description --uiport --overrides --jobinterval --maxjobs --mempercore --jobmode (we will use local by default)

Why do we need to rename input files? https://support.10xgenomics.com/single-cell-multiome-atac-gex/software/pipelines/latest/using/using/fastq-input

run_fastqc_for_gex_fastq_r1
../tools/fastqc.cwl (CommandLineTool)

Tool runs FastQC from Babraham Bioinformatics

run_fastqc_for_gex_fastq_r2
../tools/fastqc.cwl (CommandLineTool)

Tool runs FastQC from Babraham Bioinformatics

run_fastqc_for_atac_fastq_r1
../tools/fastqc.cwl (CommandLineTool)

Tool runs FastQC from Babraham Bioinformatics

run_fastqc_for_atac_fastq_r2
../tools/fastqc.cwl (CommandLineTool)

Tool runs FastQC from Babraham Bioinformatics

run_fastqc_for_atac_fastq_r3
../tools/fastqc.cwl (CommandLineTool)

Tool runs FastQC from Babraham Bioinformatics

compress_raw_feature_bc_matrices_folder
../tools/tar-compress.cwl (CommandLineTool)

Compresses input directory to tar.gz

compress_secondary_analysis_report_folder
../tools/tar-compress.cwl (CommandLineTool)

Compresses input directory to tar.gz

compress_filtered_feature_bc_matrix_folder
../tools/tar-compress.cwl (CommandLineTool)

Compresses input directory to tar.gz

Outputs

ID Type Label Doc
web_summary_report File Cell Ranger summary

Cell Ranger summary

atac_fragments_file File Count and barcode information for every ATAC fragment in TSV format

Count and barcode information for every ATAC fragment observed in the experiment in TSV format.

atac_peaks_bed_file File Identified peaks in BED format

Locations of open-chromatin regions identified in this sample. These regions are referred to as \"peaks\".

loupe_browser_track File Loupe Browser visualization file with all the analysis outputs

Loupe Browser visualization file with all the analysis outputs

collected_statistics File Collected statistics in Markdown format

Collected statistics in Markdown format

gex_molecule_info_h5 File GEX molecule-level information for aggregating samples into larger datasets

Count and barcode information for every GEX molecule observed in the experiment in hdf5 format

barcode_metrics_report File ATAC and GEX barcode metrics in CSV format

ATAC and GEX read count summaries generated for every barcode observed in the experiment. The columns contain the paired ATAC and Gene Expression barcode sequences, ATAC and Gene Expression QC metrics for that barcode, as well as whether this barcode was identified as a cell-associated partition by the pipeline.

metrics_summary_report File Run summary metrics in CSV format

Run summary metrics in CSV format

atac_peak_annotation_file File Annotations of peaks based on genomic proximity in TSV format

Annotations of peaks based on genomic proximity alone. Note that these are not functional annotations and they do not make use of linkage with GEX data.

atac_cut_sites_bigwig_file File Observed transposition sites in bigWig format

Genome track of observed transposition sites in the experiment smoothed at a resolution of 400 bases in BIGWIG format.

fastqc_report_gex_fastq_r1 File FastqQC report for GEX FASTQ file R1

FastqQC report for GEX FASTQ file R1

fastqc_report_gex_fastq_r2 File FastqQC report for GEX FASTQ file R2

FastqQC report for GEX FASTQ file R2

raw_feature_bc_matrices_h5 File Unfiltered feature-barcode matrices in HDF5 format

Raw feature barcode matrix stored as a CSC sparse matrix in hdf5 format. The rows consist of all the gene and peak features concatenated together and the columns consist of all observed barcodes with non-zero signal for either ATAC or gene expression.

fastqc_report_atac_fastq_r1 File FastqQC report for ATAC FASTQ file R1

FastqQC report for ATAC FASTQ file R1

fastqc_report_atac_fastq_r2 File FastqQC report for ATAC FASTQ file R2

FastqQC report for ATAC FASTQ file R2

fastqc_report_atac_fastq_r3 File FastqQC report for ATAC FASTQ file R3

FastqQC report for ATAC FASTQ file R3

gex_possorted_genome_bam_bai File Aligned to the genome indexed reads GEX BAM+BAI files

GEX position-sorted reads aligned to the genome and transcriptome annotated with barcode information in BAM format

atac_possorted_genome_bam_bai File Aligned to the genome indexed reads ATAC BAM+BAI files

ATAC position-sorted reads aligned to the genome annotated with barcode information in BAM format

filtered_feature_bc_matrix_h5 File Filtered feature-barcode matrices in HDF5 format

Filtered feature barcode matrix stored as a CSC sparse matrix in hdf5 format. The rows consist of all the gene and peak features concatenated together (identical to raw feature barcode matrix) and the columns are restricted to those barcodes that are identified as cells.

raw_feature_bc_matrices_folder File Compressed folder with unfiltered feature-barcode matrices

Raw feature barcode matrix stored as a CSC sparse matrix in MEX format. The rows consist of all the gene and peak features concatenated together and the columns consist of all observed barcodes with non-zero signal for either ATAC or gene expression.

secondary_analysis_report_folder File Compressed folder with secondary analysis results

Various secondary analyses that utilize the ATAC data, the GEX data, and their linkage: dimensionality reduction and clustering results for the ATAC and GEX data, differential expression, and differential accessibility for all clustering results above and linkage between ATAC and GEX data.

filtered_feature_bc_matrix_folder File Compressed folder with filtered feature-barcode matrices

Filtered feature barcode matrix stored as a CSC sparse matrix in MEX format. The rows consist of all the gene and peak features concatenated together (identical to raw feature barcode matrix) and the columns are restricted to those barcodes that are identified as cells.

generate_counts_matrix_stderr_log File stderr log generated by cellranger-arc count

stderr log generated by cellranger-arc count

generate_counts_matrix_stdout_log File stdout log generated by cellranger-arc count

stdout log generated by cellranger-arc count

Permalink: https://w3id.org/cwl/view/git/b1a5dabeeeb9079b30b2871edd9c9034a1e00c1c/workflows/cellranger-arc-count.cwl