Workflow: STAR-RNA-Seq alignment and transcript/gene abundance workflow

Fetched 2023-01-09 19:03:16 GMT
children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
strand
refFlat File
gtf_file File
reference File
unaligned https://w3id.org/cwl/view/git/a08de598edc04f340fdbff76c9a92336a7702022/definitions/types/sequence_data.yml#sequence_data[]
cdna_fasta File
sample_name String
unzip_fastqs Boolean (Optional)
kallisto_index File
star_genome_dir Directory
trimming_adapters File
outsam_attrrg_line String[]
ribosomal_intervals File
trimming_max_uncalled Integer
star_fusion_genome_dir Directory
trimming_min_readlength Integer
trimming_adapter_trim_end String
gene_transcript_lookup_table File
trimming_adapter_min_overlap Integer

Steps

ID Runs Label Doc
kallisto
../tools/kallisto.cwl (CommandLineTool)
Kallisto: Quant
mark_dup
../tools/mark_duplicates_and_sort.cwl (CommandLineTool)
Mark duplicates and Sort
sort_bam
../tools/samtools_sort.cwl (CommandLineTool)
samtools sort
index_bam
../tools/index_bam.cwl (CommandLineTool)
samtools index
stringtie
../tools/stringtie.cwl (CommandLineTool)
StringTie
index_cram
../tools/index_cram.cwl (CommandLineTool)
samtools index cram
bam_to_cram
../tools/bam_to_cram.cwl (CommandLineTool)
BAM to CRAM conversion
star_align_fusion
../tools/star_align_fusion.cwl (CommandLineTool)
STAR: align reads to transcriptome
star_fusion_detect
../tools/star_fusion_detect.cwl (CommandLineTool)
STAR-Fusion identify candidate fusion transcript
strandedness_check
../tools/strandedness_check.cwl (CommandLineTool)
runs how_are_we_stranded_here to determine RNAseq data strandedness

Uses how_are_we_stranded_here, a python package for testing strandedness. Runs Kallisto and Rseqc (infer-experiment-py) to to check which direction reads align once mapped in transcripts. It first creates a Kallisto index (or uses a pre-made index) of your organism's transcriptome. It then maps a small subset of reads (default 200000) to the transcriptome and uses Kallisto's --genomebam argument to project pseudoalignments to the genome sorted BAM file. (Currently only Kallisto version 0.44.0 works well with how_are_we_stranded_here.) It finally runs RSeQC's infer_experiment.py to check which direction reads from the first and second pairs are aligned in relation to the transcript strand, and provides output with the likely strandedness of your data.

transcript_to_gene
../tools/transcript_to_gene.cwl (CommandLineTool)
Kallisto: TranscriptToGene
generate_qc_metrics
../tools/generate_qc_metrics.cwl (CommandLineTool)
Picard: RNA Seq Metrics
cgpbigwig_bamcoverage
../tools/bam_to_bigwig.cwl (CommandLineTool)
cgpBigWig Converting BAM to BigWig
sequence_to_trimmed_fastq sequence (bam or fastqs) to trimmed fastqs

Outputs

ID Type Label Doc
cram File
chart File
metrics File
strand_info File[]
gene_abundance File
fusion_evidence File
star_fusion_log File
star_fusion_out File
star_junction_out File
bamcoverage_bigwig File
star_fusion_abridge File
star_fusion_predict File
transcript_abundance_h5 File
stringtie_transcript_gtf File
transcript_abundance_tsv File
stringtie_gene_expression_tsv File
Permalink: https://w3id.org/cwl/view/git/a08de598edc04f340fdbff76c9a92336a7702022/definitions/pipelines/rnaseq_star_fusion.cwl