Workflow: STAR-RNA-Seq alignment and transcript/gene abundance workflow

Fetched 2025-05-12 19:26:03 GMT
children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
strand
refFlat File
reference File
unaligned https://w3id.org/cwl/view/git/bfcb5ffbea3d00a38cc03595d41e53ea976d599d/definitions/types/sequence_data.yml#sequence_data[]

Raw data from rna sequencing; this custom type holds both the data file(s) and readgroup information. Data file(s) may be either a bam file, or paired fastqs. Readgroup information should be given as a series of key:value pairs, each separated by a space. This means that spaces within a value must be double quoted. The first key must be ID; consult the read group description in the header section of the SAM file specification for other, optional keys. Below is an example of an element of the input array: readgroup: \"ID:xxx PU:xxx SM:xxx LB:xxx PL:ILLUMINA CN:WUGSC\" sequence: fastq1: class: File path: /path/to/reads1.fastq fastq2: class: File path: /path/to/reads2.fastq OR bam: class: File path: /path/to/reads.bam

cdna_fasta File
sample_name String
unzip_fastqs Boolean (Optional)
kallisto_index File
star_genome_dir Directory
agfusion_database File
trimming_adapters File
ribosomal_intervals File
fusioninspector_mode
reference_annotation File
examine_coding_effect Boolean (Optional)
trimming_max_uncalled Integer
star_fusion_genome_dir Directory
trimming_min_readlength Integer
trimming_adapter_trim_end String
gene_transcript_lookup_table File
trimming_adapter_min_overlap Integer
agfusion_annotate_noncanonical Boolean (Optional)

Steps

ID Runs Label Doc
agfusion
../tools/agfusion.cwl (CommandLineTool)
A tool that annotates STAR gene fusion predictions
kallisto
../tools/kallisto.cwl (CommandLineTool)
Kallisto: Quant
mark_dup
../tools/mark_duplicates_and_sort.cwl (CommandLineTool)
Mark duplicates and Sort
sort_bam
../tools/samtools_sort.cwl (CommandLineTool)
samtools sort
index_bam
../tools/index_bam.cwl (CommandLineTool)
samtools index
stringtie
../tools/stringtie.cwl (CommandLineTool)
StringTie
index_cram
../tools/index_cram.cwl (CommandLineTool)
samtools index cram
bam_to_cram
../tools/bam_to_cram.cwl (CommandLineTool)
BAM to CRAM conversion
star_align_fusion
../tools/star_align_fusion.cwl (CommandLineTool)
STAR: align reads to transcriptome
star_fusion_detect
../tools/star_fusion_detect.cwl (CommandLineTool)
STAR-Fusion identify candidate fusion transcript
strandedness_check
../tools/strandedness_check.cwl (CommandLineTool)
runs how_are_we_stranded_here to determine RNAseq data strandedness

Uses how_are_we_stranded_here, a python package for testing strandedness. Runs Kallisto and Rseqc (infer-experiment-py) to to check which direction reads align once mapped in transcripts. It first creates a Kallisto index (or uses a pre-made index) of your organism's transcriptome. It then maps a small subset of reads (default 200000) to the transcriptome and uses Kallisto's --genomebam argument to project pseudoalignments to the genome sorted BAM file. (Currently only Kallisto version 0.44.0 works well with how_are_we_stranded_here.) It finally runs RSeQC's infer_experiment.py to check which direction reads from the first and second pairs are aligned in relation to the transcript strand, and provides output with the likely strandedness of your data.

transcript_to_gene
../tools/transcript_to_gene.cwl (CommandLineTool)
Kallisto: TranscriptToGene
generate_qc_metrics
../tools/generate_qc_metrics.cwl (CommandLineTool)
Picard: RNA Seq Metrics
cgpbigwig_bamcoverage
../tools/bam_to_bigwig.cwl (CommandLineTool)
cgpBigWig Converting BAM to BigWig
sequence_to_trimmed_fastq sequence (bam or fastqs) to trimmed fastqs

Outputs

ID Type Label Doc
cram File
chart File
metrics File
final_bam File
strand_info File[]
gene_abundance File
fusion_evidence File
star_fusion_log File
star_fusion_out File
star_junction_out File
bamcoverage_bigwig File
star_fusion_abridge File
star_fusion_predict File
coding_region_effects File (Optional)
transcript_abundance_h5 File
fusioninspector_evidence File[] (Optional)
stringtie_transcript_gtf File
transcript_abundance_tsv File
annotated_fusion_predictions Directory
stringtie_gene_expression_tsv File
Permalink: https://w3id.org/cwl/view/git/bfcb5ffbea3d00a38cc03595d41e53ea976d599d/definitions/pipelines/rnaseq_star_fusion.cwl