Workflow: Transcriptome assembly workflow (single-end version)

Fetched 2024-11-27 18:35:26 GMT
children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
end_mode https://w3id.org/cwl/view/git/72f702591368397f56d455128f60916902104dd2/tools/Trimmomatic/trimmomatic-end_mode.yaml#end_mode read -end mode format

Read -end mode format to be specify to Trimmomatic

read_files File[] [FASTQ] FASTQ read file(s)

FASTQ file of reverse reads in Paired End mode

trinity_cpu Integer (Optional) number of CPUs allocated

number of CPUs to use, default: 2

forward_reads File [FASTQ] Paired-end read file 1

Read file 1 in FASTQ format

trinity_max_mem String maximum memory allocated to Trinity

Suggested max memory to use by Trinity where limiting can be enabled. (jellyfish, sorting, etc) provided in Gb of RAM, ie. --max_memory 10G

trinity_seq_type String read file(s) format

type of reads: (fa or fq)

trimmomatic_phred https://w3id.org/cwl/view/git/72f702591368397f56d455128f60916902104dd2/tools/Trimmomatic/trimmomatic-phred.yaml#phred quality score format

Either PHRED \"33\" or \"64\" specifies the base quality encoding. Default: 64

trinity_ss_lib_type String Strand-specific RNA-Seq read orientation

Strand-specific RNA-Seq read orientation. if paired: RF or FR, if single: F or R. (dUTP method = RF). See web documentation

trimmomatic_slidingWindow https://w3id.org/cwl/view/git/72f702591368397f56d455128f60916902104dd2/tools/Trimmomatic/trimmomatic-sliding_window.yaml#slidingWindow read filtering sliding window

Perform a sliding window trimming, cutting once the average quality within the window falls below a threshold. By considering multiple bases, a single poor quality base will not cause the removal of high quality data later in the read. <windowSize> specifies the number of bases to average across <requiredQuality> specifies the average quality required

Steps

ID Runs Label Doc
filter_reads
../tools/Trimmomatic/Trimmomatic-v0.36.cwl (CommandLineTool)
Trimmomatic - A flexible read trimming tool for Illumina NGS data

Trimmomatic is a fast, multithreaded command line tool that can be used to trim and crop Illumina (FASTQ) data as well as to remove adapters. These adapters can pose a real problem depending on the library preparation and downstream application. There are two major modes of the program: Paired end mode and Single end mode. The paired end mode will maintain correspondence of read pairs and also use the additional information contained in paired reads to better find adapter or PCR primer fragments introduced by the library preparation process. Trimmomatic works with FASTQ files (using phred + 33 or phred + 64 quality scores, depending on the Illumina pipeline used).

run_assembly
../tools/Trinity/Trinity-V2.6.5.single-end.cwl (CommandLineTool)
Trinity assembles transcript sequences from Illumina RNA-Seq data.

Trinity, developed at the Broad Institute and the Hebrew University of Jerusalem, represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-seq reads. Documentation at https://github.com/trinityrnaseq/trinityrnaseq/wiki

evaluate_contigs
../tools/Transrate/Transrate-V1.0.3.cwl (CommandLineTool)
Transrate - A de-novo transcriptome assembly evaluation facility.

Analyse a de-novo transcriptome assembly using three kinds of metrics: 1. sequence based (if --assembly is given) 2. read mapping based (if --left and --right are given) 3. reference based (if --reference is given) Documentation at http://hibberdlab.com/transrate

generate_raw_stats
../tools/FastQC/FastQC-v0.11.7.cwl (CommandLineTool)
FastQC - A high throughtput sequence analyses QC.

FastQC reads a set of sequence files and produces from each one a quality control report consisting of a number of different modules, each one of which will help to identify a different potential type of problem in your data. If no files to process are specified on the command line then the program will start as an interactive graphical application. If files are provided on the command line then the program will run with no user interaction required. In this mode it is suitable for inclusion into a standardised analysis pipeline. Please visit https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ for full documentation.

generate_filtered_stats
../tools/FastQC/FastQC-v0.11.7.cwl (CommandLineTool)
FastQC - A high throughtput sequence analyses QC.

FastQC reads a set of sequence files and produces from each one a quality control report consisting of a number of different modules, each one of which will help to identify a different potential type of problem in your data. If no files to process are specified on the command line then the program will start as an interactive graphical application. If files are provided on the command line then the program will run with no user interaction required. In this mode it is suitable for inclusion into a standardised analysis pipeline. Please visit https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ for full documentation.

Outputs

ID Type Label Doc
raw_qc_report File[]
raw_html_report File[] [HTML]
assembled_contigs File [FASTA]
filtered_qc_report File[]
assembly_output_dir Directory
filtered_html_report File[] [HTML]
forward_reads_paired File [FASTQ]
transrate_output_dir Directory
trimmomatic_log_file File (Optional)
forward_reads_unpaired File (Optional) [FASTQ]
Permalink: https://w3id.org/cwl/view/git/72f702591368397f56d455128f60916902104dd2/workflows/TranscriptomeAssembly-wf.single-end.cwl