STAR-RNA-Seq alignment and transcript/gene abundance workflow

Workflow: STAR-RNA-Seq alignment and transcript/gene abundance workflow

Fetched 2025-05-12 19:26:03 GMT

Verified with cwltool version 3.1.20230201224320

Selected
|
Default Values
Nested Workflows
Tools
Inputs/Outputs

This workflow is Open Source and may be reused according to the terms of: MIT License

Note that the tools invoked by the workflow may have separate licenses.

Inputs

ID	Type	Doc
strand
refFlat	File
reference	File
unaligned	https://w3id.org/cwl/view/git/bfcb5ffbea3d00a38cc03595d41e53ea976d599d/definitions/types/sequence_data.yml#sequence_data[]	Raw data from rna sequencing; this custom type holds both the data file(s) and readgroup information. Data file(s) may be either a bam file, or paired fastqs. Readgroup information should be given as a series of key:value pairs, each separated by a space. This means that spaces within a value must be double quoted. The first key must be ID; consult the read group description in the header section of the SAM file specification for other, optional keys. Below is an example of an element of the input array: readgroup: \"ID:xxx PU:xxx SM:xxx LB:xxx PL:ILLUMINA CN:WUGSC\" sequence: fastq1: class: File path: /path/to/reads1.fastq fastq2: class: File path: /path/to/reads2.fastq OR bam: class: File path: /path/to/reads.bam
cdna_fasta	File
sample_name	String
unzip_fastqs	Boolean (Optional)
kallisto_index	File
star_genome_dir	Directory
agfusion_database	File
trimming_adapters	File
ribosomal_intervals	File
fusioninspector_mode
reference_annotation	File
examine_coding_effect	Boolean (Optional)
trimming_max_uncalled	Integer
star_fusion_genome_dir	Directory
trimming_min_readlength	Integer
trimming_adapter_trim_end	String
gene_transcript_lookup_table	File
trimming_adapter_min_overlap	Integer
agfusion_annotate_noncanonical	Boolean (Optional)

Steps

ID	Runs	Label	Doc
agfusion	../tools/agfusion.cwl (CommandLineTool)	A tool that annotates STAR gene fusion predictions
kallisto	../tools/kallisto.cwl (CommandLineTool)	Kallisto: Quant
mark_dup	../tools/mark_duplicates_and_sort.cwl (CommandLineTool)	Mark duplicates and Sort
sort_bam	../tools/samtools_sort.cwl (CommandLineTool)	samtools sort
index_bam	../tools/index_bam.cwl (CommandLineTool)	samtools index
stringtie	../tools/stringtie.cwl (CommandLineTool)	StringTie
index_cram	../tools/index_cram.cwl (CommandLineTool)	samtools index cram
bam_to_cram	../tools/bam_to_cram.cwl (CommandLineTool)	BAM to CRAM conversion
star_align_fusion	../tools/star_align_fusion.cwl (CommandLineTool)	STAR: align reads to transcriptome
star_fusion_detect	../tools/star_fusion_detect.cwl (CommandLineTool)	STAR-Fusion identify candidate fusion transcript
strandedness_check	../tools/strandedness_check.cwl (CommandLineTool)	runs how_are_we_stranded_here to determine RNAseq data strandedness	Uses how_are_we_stranded_here, a python package for testing strandedness. Runs Kallisto and Rseqc (infer-experiment-py) to to check which direction reads align once mapped in transcripts. It first creates a Kallisto index (or uses a pre-made index) of your organism's transcriptome. It then maps a small subset of reads (default 200000) to the transcriptome and uses Kallisto's --genomebam argument to project pseudoalignments to the genome sorted BAM file. (Currently only Kallisto version 0.44.0 works well with how_are_we_stranded_here.) It finally runs RSeQC's infer_experiment.py to check which direction reads from the first and second pairs are aligned in relation to the transcript strand, and provides output with the likely strandedness of your data.
transcript_to_gene	../tools/transcript_to_gene.cwl (CommandLineTool)	Kallisto: TranscriptToGene
generate_qc_metrics	../tools/generate_qc_metrics.cwl (CommandLineTool)	Picard: RNA Seq Metrics
cgpbigwig_bamcoverage	../tools/bam_to_bigwig.cwl (CommandLineTool)	cgpBigWig Converting BAM to BigWig
sequence_to_trimmed_fastq	../subworkflows/sequence_to_trimmed_fastq.cwl (Workflow)	sequence (bam or fastqs) to trimmed fastqs

Outputs

ID	Type	Label	Doc
cram	File
chart	File
metrics	File
final_bam	File
strand_info	File[]
gene_abundance	File
fusion_evidence	File
star_fusion_log	File
star_fusion_out	File
star_junction_out	File
bamcoverage_bigwig	File
star_fusion_abridge	File
star_fusion_predict	File
coding_region_effects	File (Optional)
transcript_abundance_h5	File
fusioninspector_evidence	File[] (Optional)
stringtie_transcript_gtf	File
transcript_abundance_tsv	File
annotated_fusion_predictions	Directory
stringtie_gene_expression_tsv	File

Permalink: https://w3id.org/cwl/view/git/bfcb5ffbea3d00a38cc03595d41e53ea976d599d/definitions/pipelines/rnaseq_star_fusion.cwl