Workflow: EMG pipeline v3.0 (draft CWL version)

Fetched 2024-05-09 21:59:36 GMT
children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
5S_model File [HMMER format]
16S_model File [HMMER format]
23S_model File [HMMER format]
tRNA_model File [HMMER format]
forward_reads File [FASTQ]
reverse_reads File [FASTQ]
fraggenescan_model File
fraggenescan_pwm_dist File
fraggenescan_prob_stop File
fraggenescan_prob_start File
fraggenescan_prob_stop1 File
fraggenescan_prob_start1 File
fraggenescan_prob_forward File
fraggenescan_prob_backward File
fraggenescan_prob_noncoding File

Steps

ID Runs Label Doc
index_reads
../tools/esl-sfetch-index.cwl (CommandLineTool)
index a sequence file for use by esl-sfetch

https://github.com/EddyRivasLab/easel

fraggenescan
../tools/FragGeneScan1_20.cwl (CommandLineTool)
FragGeneScan: find (fragmented) genes in short reads

FragGeneScan is an application for finding (fragmented) genes in short reads. It can also be applied to predict prokaryotic genes in incomplete assemblies or complete genomes.

FragGeneScan was first released through omics website (http://omics.informatics.indiana.edu/FragGeneScan/) in March 2010, where you can find its old releases. FragGeneScan migrated to SourceForge in October, 2013 (https://sourceforge.net/projects/fraggenescan/).

Version 1.20 can be downloaded here: https://sourceforge.net/projects/fraggenescan/files/

interproscan
../tools/InterProScan5.21-60.cwl (CommandLineTool)
InterProScan: protein sequence classifier

Version 5.21-60 can be downloaded here: https://github.com/ebi-pf-team/interproscan/wiki/HowToDownload

Documentation on how to run InterProScan 5 can be found here: https://github.com/ebi-pf-team/interproscan/wiki/HowToRun

overlap_reads
../tools/seqprep.cwl (CommandLineTool)
combine_seqprep
../tools/seqprep-merge.cwl (CommandLineTool)
find_5S_matches
find_16S_matches
find_23S_matches
find_tRNA_matches
mask_rRNA_and_tRNA
../tools/mask_RNA.cwl (CommandLineTool)
trim_quality_control
../tools/trimmomatic.cwl (CommandLineTool)

Trimmomatic is a fast, multithreaded command line tool that can be used to trim and crop Illumina (FASTQ) data as well as to remove adapters. These adapters can pose a real problem depending on the library preparation and downstream application. There are two major modes of the program: Paired end mode and Single end mode. The paired end mode will maintain correspondence of read pairs and also use the additional information contained in paired reads to better find adapter or PCR primer fragments introduced by the library preparation process. Trimmomatic works with FASTQ files (using phred + 33 or phred + 64 quality scores, depending on the Illumina pipeline used).

collate_unique_rRNA_hmmer_hits
collate_unique_tRNA_hmmer_hits
convert_trimmed-reads_to_fasta
../tools/fastq_to_fasta.cwl (CommandLineTool)

Outputs

ID Type Label Doc
pCDS File
annotations File
processed_sequences File
Permalink: https://w3id.org/cwl/view/git/316831663e84623eb0e3a260af252fef441924d4/workflows/emg-pipeline-v3.cwl