Workflow: Single-Cell Preprocessing Pipeline

Fetched 2023-01-04 15:34:09 GMT

Devel version of Single-Cell Preprocessing Pipeline ===================================================

children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
alias String Experiment short name/Alias
threads Integer (Optional) Number of threads

Number of threads for those steps that support multithreading

fastq_file_1 File [FASTQ] FASTQ file 1 (optionally compressed)

FASTQ file 1 (optionally compressed)

fastq_file_2 File [FASTQ] FASTQ file 2 (optionally compressed)

FASTQ file 2 (optionally compressed)

memory_limit String (Optional) Maximum memory used

Maximum memory used

sc_technology https://w3id.org/cwl/view/git/4ab9399a4777610a579ea2c259b9356f27641dcc/workflows/single-cell-preprocess.cwl#sc_technology/sc_technology Single-cell technology used

Single-cell technology used

workflow_type https://w3id.org/cwl/view/git/4ab9399a4777610a579ea2c259b9356f27641dcc/workflows/single-cell-preprocess.cwl#workflow_type/workflow_type (Optional) Workflow type

Type of workflow. Use lamanno to calculate RNA velocity based on La Manno et al. 2018 logic. Use nucleus to calculate RNA velocity on single-nucleus RNA-seq reads. Default: standard

genome_fasta_file File [FASTA] Reference genome FASTA file

Reference genome FASTA file that includes all chromosomes

annotation_gtf_file File [GTF] GTF annotation file

GTF annotation file that includes refGene and mitochondrial DNA annotations

Steps

ID Runs Label Doc
extract_fastq_1
../tools/extract-fastq.cwl (CommandLineTool)

Tool to decompress input FASTQ file(s). If several FASTQ files are provided, they will be concatenated in the order that corresponds to files in input. Bash script's logic: - disable case sensitive glob check - check if root name of input file already include '.fastq' or '.fq' extension. If yes, set DEFAULT_EXT to \"\", otherwise use '.fastq' - check file type, decompress if needed - return 1, if file type is not recognized This script also works of input file doesn't have any extension at all

extract_fastq_2
../tools/extract-fastq.cwl (CommandLineTool)

Tool to decompress input FASTQ file(s). If several FASTQ files are provided, they will be concatenated in the order that corresponds to files in input. Bash script's logic: - disable case sensitive glob check - check if root name of input file already include '.fastq' or '.fq' extension. If yes, set DEFAULT_EXT to \"\", otherwise use '.fastq' - check file type, decompress if needed - return 1, if file type is not recognized This script also works of input file doesn't have any extension at all

prepare_indices
../tools/kb-ref.cwl (CommandLineTool)

Builds a kallisto index, transcript-to-gene mapping and cDNA/mismatch FASTA files. If workflow_type is lamanno or nucleus tool will produce additional three outputs: - intron FASTA file - cDNA transcripts-to-capture TSV file - intron transcripts-to-capture TSV file Otherwise the correspondent outputs will be null.

Notes: --verbose was hardcoded --keep-tmp, -d, --overwrite doesn't make sense when running from container -f2, -c1 and -c2 are always appended to the basecommand regardless of workflow_type (makes cwl less complicate)

`annotation_gtf_file` input should have correct \"gene_id\" field.

To generate correct GTF from refgene annotations use: `docker run --rm -ti -v `pwd`:/tmp/ biowardrobe2/ucscuserapps:v358 /bin/bash -c \"cut -f 2- refGene.txt | genePredToGtf file stdin refgene.gtf\"` to generate a proper gtf file from `refGene.txt` downloaded from http://hgdownload.cse.ucsc.edu/goldenPath/${GEN}/database/refGene.txt.gz

collect_statistics
single-cell-preprocess.cwl#collect_statistics/bed57bfa-4358-48f4-9dd3-188aed86e59a (CommandLineTool)
compress_counts_folder
../tools/tar-compress.cwl (CommandLineTool)

Compresses input directory to tar.gz

generate_counts_matrix
../tools/kb-count.cwl (CommandLineTool)

Uses kallisto to pseudoalign reads and bustools to quantify the data.

1. Generates BUS file from input fastq files 2. Sorts generated BUS file 3. Inspects sorted BUS file 4. Corrects barcodes in sorted BUS file 5. Sorts corrected BUS file 6. Generates count matrix from sorted barcode corrected BUS file

Notes: --verbose was hardcoded --keep-tmp, --overwrite doesn't make sense when running from container -o is used by default, so all outputs go to the current folder

Not implemented parameters: -w --tcc --dry-run --filter

run_fastqc_for_fastq_1
../tools/fastqc.cwl (CommandLineTool)

Tool runs FastQC from Babraham Bioinformatics

run_fastqc_for_fastq_2
../tools/fastqc.cwl (CommandLineTool)

Tool runs FastQC from Babraham Bioinformatics

Outputs

ID Type Label Doc
whitelist_file File Whitelisted barcodes

Whitelisted barcodes that correspond to the used single-cell technology

ec_mapping_file File Mapping equivalence classes to transcripts

Mapping equivalence classes to transcripts generated by kallisto bus

transcripts_file File Transcript names

Transcript names file generated by kallisto bus

kallisto_bus_report File Pseudoalignment report

Pseudoalignment report generated by kallisto bus

not_sorted_bus_file File Not sorted BUS file

Not sorted BUS file generated by kallisto bus

collected_statistics File Collected statistics in Markdown format

Collected statistics in Markdown format

fastqc_report_fastq_1 File FastqQC report for FASTQ file 1

FastqQC report for FASTQ file 1

fastqc_report_fastq_2 File FastqQC report for FASTQ file 2

FastqQC report for FASTQ file 2

bustools_inspect_report File Report summarizing BUS file content

Report summarizing BUS file content generated by bustools inspect

counts_unfiltered_folder File Compressed folder with count matrix files

Compressed folder with count matrix files generated by bustools count

corrected_sorted_bus_file File Sorted BUS file with corrected barcodes

Sorted BUS file with corrected barcodes generated by bustools correct

prepare_indices_stderr_log File stderr log generated by kb ref

stderr log generated by kb ref

prepare_indices_stdout_log File stdout log generated by kb ref

stdout log generated by kb ref

generate_counts_matrix_stderr_log File stderr log generated by kb count

stderr log generated by kb count

generate_counts_matrix_stdout_log File stdout log generated by kb count

stdout log generated by kb count

Permalink: https://w3id.org/cwl/view/git/4ab9399a4777610a579ea2c259b9356f27641dcc/workflows/single-cell-preprocess.cwl