CWL Workflow: TOPMed_RNA-seq

Workflow: TOPMed_RNA-seq

Fetched 2023-01-12 02:05:51 GMT

Verified with cwltool version 3.1.20221201130942

TOPMed RNA-seq CWL workflow. Documentation on the workflow can be found [here](https://github.com/heliumdatacommons/cwl_workflows/blob/master/topmed-workflows/TOPMed_RNAseq_pipeline/README.md). Example input files: [Dockstore.json](https://github.com/heliumdatacommons/cwl_workflows/blob/master/topmed-workflows/TOPMed_RNAseq_pipeline/input-examples/Dockstore.json) and [rnaseq_pipeline_fastq-example.yml](https://github.com/heliumdatacommons/cwl_workflows/blob/master/topmed-workflows/TOPMed_RNAseq_pipeline/input-examples/rnaseq_pipeline_fastq-example.yml). Quickstart instructions are [here](https://github.com/heliumdatacommons/cwl_workflows/blob/master/topmed-workflows/TOPMed_RNAseq_pipeline/README.md#Quick Start). [GitHub Repo](https://github.com/heliumdatacommons/cwl_workflows) Pipeline steps: 1. Align RNA-seq reads with [STAR v2.5.3a](https://github.com/alexdobin/STAR). 2. Run [Picard](https://github.com/broadinstitute/picard) [MarkDuplicates](https://broadinstitute.github.io/picard/command-line-overview.html#MarkDuplicates). 2a. Create BAM index for MarkDuplicates BAM with [Samtools 1.6](https://github.com/samtools/samtools/releases) index. 3. Transcript quantification with [RSEM 1.3.0](https://deweylab.github.io/RSEM/) 4. Gene quantification and quality control with [RNA-SeQC 1.1.9](https://github.com/francois-a/rnaseqc)

Selected
|
Default Values
Nested Workflows
Tools
Inputs/Outputs

This workflow is Open Source and may be reused according to the terms of: BSD 3-clause "New" or "Revised" License

Note that the tools invoked by the workflow may have separate licenses.

Inputs

ID	Type	Title	Doc
fastqs	File[]
genes_gtf	File
paired_end	Boolean
prefix_str	String
star_index	Directory
is_stranded	Boolean
genome_fasta	File
max_frag_len	Integer
rsem_ref_dir	Directory
estimate_rspd	Boolean
rnaseqc_flags	String[]

Steps

ID	Runs	Label	Doc
run_rsem	rsem.cwl (CommandLineTool)	run-rsem	A CWL wrapper for [run_RSEM.py](https://github.com/broadinstitute/gtex-pipeline/blob/master/rnaseq/src/run_RSEM.py) Runs [RSEM 1.3.0](https://deweylab.github.io/RSEM/) This CWL Tool was developed as step 3 of the TOPMed RNA-seq workflow.
run_star	star.cwl (CommandLineTool)	run-star	A CWL wrapper for [run_STAR.py](https://github.com/broadinstitute/gtex-pipeline/blob/master/rnaseq/src/run_STAR.py) Runs [STAR v2.5.3a](https://github.com/alexdobin/STAR) This CWL Tool was developed as step 1 of the TOPMed RNA-seq workflow. [GitHub Repo](https://github.com/heliumdatacommons/cwl_workflows)
sort_bam	samtools-sort.cwl (CommandLineTool)		Sort alignments by leftmost coordinates, or by read name when -n is used. An appropriate @HD-SO sort order header tag will be added or an existing one updated if necessary. Usage: samtools sort [-l level] [-m maxMem] [-o out.bam] [-O format] [-n] -T out.prefix [-@ threads] [in.bam] Options: -l INT Set the desired compression level for the final output file, ranging from 0 (uncompressed) or 1 (fastest but minimal compression) to 9 (best compression but slowest to write), similarly to gzip(1)'s compression level setting. If -l is not used, the default compression level will apply. -n Sort by read names (i.e., the QNAME field) rather than by chromosomal coordinates. -o FILE Write the final sorted output to FILE, rather than to standard output. -O FORMAT Write the final output as sam, bam, or cram. By default, samtools tries to select a format based on the -o filename extension; if output is to standard output or no format can be deduced, -O must be used. -T PREFIX Write temporary files to PREFIX.nnnn.bam. This option is required.
index_bam	indexbam.cwl (CommandLineTool)	run-index-bam	A wrapper for running `samtools index <bam>`.
run_rna-seqc	rna_seqc.cwl (CommandLineTool)	run-seqc	A CWL wrapper for [run_rnaseqc.py](https://github.com/heliumdatacommons/cwl_workflows/blob/master/topmed-workflows/TOPMed_RNAseq_pipeline/src/run_rnaseqc.py) duplicated from [run_rnaseqc.py](https://github.com/broadinstitute/gtex-pipeline/blob/master/rnaseq/src/run_rnaseqc.py) with minor modifications. Runs [RNA-SeQC 1.1.9](https://github.com/francois-a/rnaseqc) This CWL Tool was developed as step 4 of the TOPMed RNA-seq workflow. [GitHub Repo](https://github.com/heliumdatacommons/cwl_workflows)
run_markduplicates	markduplicates.cwl (CommandLineTool)	run-MarkDuplicates	A CWL wrapper for [run_MarkDuplicates.py](https://github.com/broadinstitute/gtex-pipeline/blob/master/rnaseq/src/run_MarkDuplicates.py) Runs [Picard](https://github.com/broadinstitute/picard) [MarkDuplicates](https://broadinstitute.github.io/picard/command-line-overview.html#MarkDuplicates) This CWL Tool was developed as step 2 of the TOPMed RNA-seq workflow. [GitHub Repo](https://github.com/heliumdatacommons/cwl_workflows)
run_index_markduplicates_bam	indexbam.cwl (CommandLineTool)	run-index-bam	A wrapper for running `samtools index <bam>`.

Outputs

ID	Type	Label	Doc
star_output_bam	File
star_output_logs	File[]
star_output_bam_index	File
star_output_junctions	File
star_output_read_counts	File
markduplicates_bam_index	File
rsem_output_gene_results	File
markduplicates_output_bam	File
rna-seqc_output_gene_rpkm	File
rna-seqc_output_exon_counts	File
rna-seqc_output_gene_counts	File
star_output_junctions_pass1	File
rsem_output_isoforms_results	File
markduplicates_output_metrics	File
rna-seqc_output_count_metrics	File
rna-seqc_output_count_outputs	File
star_output_transcriptome_bam	File
star_output_chimeric_junctions	File

Permalink:

https://w3id.org/cwl/view/git/018d344b12e9e1b888e21e0819096f9b337d371d/topmed-workflows/TOPMed_RNAseq_pipeline/rnaseq_pipeline_fastq.cwl