CWL Workflow: Transcripts annotation workflow

Workflow: Transcripts annotation workflow

Fetched 2023-01-12 19:42:19 GMT

Verified with cwltool version 3.1.20221201130942

Selected
|
Default Values
Nested Workflows
Tools
Inputs/Outputs

This workflow is Open Source and may be reused according to the terms of: Apache License 2.0

Note that the tools invoked by the workflow may have separate licenses.

Inputs

ID	Type	Title	Doc
replace	https://w3id.org/cwl/view/git/26dad276bac124f89086268bcbca962a5c0caca6/utils/esl-reformat-replace.yaml#replace (Optional)
blockSize	Float (Optional)
buscoMode	https://w3id.org/cwl/view/git/26dad276bac124f89086268bcbca962a5c0caca6/tools/BUSCO/BUSCO-assessment_modes.yaml#assessment_modes
i5Databases	Directory
buscoLineage	Directory
clanInfoFile	File
diamondSeqdb	File
cmsearchCores	Integer
i5_chunk_size	Integer (Optional)
i5Applications	https://w3id.org/cwl/view/git/26dad276bac124f89086268bcbca962a5c0caca6/tools/InterProScan/InterProScan-apps.yaml#apps[] (Optional)
i5OutputFormat	https://w3id.org/cwl/view/git/26dad276bac124f89086268bcbca962a5c0caca6/tools/InterProScan/InterProScan-protein_formats.yaml#protein_formats[] (Optional)
singleBestOnly	Boolean (Optional)
buscoOutputName	String
transcriptsFile	File [FASTA]
covariance_models	File[]

Steps

ID	Runs	Label	Doc
identify_nc_rna	cmsearch-multimodel-wf.cwl (Workflow)	Identifies non-coding RNAs using Rfams covariance models
cut_fasta_header	../utils/cut_fasta_headers.cwl (CommandLineTool)	Cuts FASTA headers which are too long	Cuts away everything after the first whitespace character.
clean_fasta_header	../utils/clean_fasta_headers.cwl (CommandLineTool)	Replaces problematic characters from FASTA headers with dashes
functional_analysis	InterProScan-v5-chunked-wf.cwl (Workflow)	Runs InterProScan on batches of sequences to retrieve functional annotations.
identify_coding_regions	TransDecoder-v5-wf-2steps.cwl (Workflow)	TransDecoder 2 step workflow, running TransDecoder.LongOrfs (step 1) followed by TransDecoder.Predict (step2)
calculate_diamond_matches	../tools/Diamond/Diamon.blastx-v0.9.21.cwl (CommandLineTool)	Aligns DNA query sequences against a protein reference database	DIAMOND is a sequence aligner for protein and translated DNA searches, designed for high performance analysis of big sequence data. The key features are: + Pairwise alignment of proteins and translated DNA at 500x-20,000x speed of BLAST. + Frameshift alignments for long read analysis. + Low resource requirements and suitable for running on standard desktops or laptops. + Various output formats, including BLAST pairwise, tabular and XML, as well as taxonomic classification. Please visit https://github.com/bbuchfink/diamond for full documentation. Releases can be downloaded from https://github.com/bbuchfink/diamond/releases
run_transcriptome_assessment	../tools/BUSCO/BUSCO-v3.cwl (CommandLineTool)	Assesses genome assembly and annotation completeness with single-copy orthologs	BUSCO v3 provides quantitative measures for the assessment of genome assembly, gene set, and transcriptome completeness, based on evolutionarily-informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB v9. BUSCO assessments are implemented in open-source software, with a large selection of lineage-specific sets of Benchmarking Universal Single-Copy Orthologs. These conserved orthologs are ideal candidates for large-scale phylogenomics studies, and the annotated BUSCO gene models built during genome assessments provide a comprehensive gene predictor training set for use as part of genome annotation pipelines. Please visit http://busco.ezlab.org/ for full documentation. The BUSCO assessment software distribution is available from the public GitLab project: https://gitlab.com/ezlab/busco where it can be downloaded or cloned using a git client (git clone https://gitlab.com/ezlab/busco.git). We encourage users to opt for the git client option in order to facilitate future updates. BUSCO is written for Python 3.x and Python 2.7+. It runs with the standard packages. We recommend using Python3 when available.
remove_asterisks_and_reformat	../utils/esl-reformat.cwl (CommandLineTool)	Normalizes input sequences to FASTA using esl-reformat	Normalizes input sequences to FASTA with fixed number of sequence characters per line using esl-reformat from https://github.com/EddyRivasLab/easel

Outputs

ID	Type	Label	Doc
bed_output	File
gff3_output	File
i5Annotations	File
coding_regions	File
diamond_matches	File
busco_full_table	File
peptide_sequences	File
busco_blast_output	Directory
busco_hmmer_output	Directory
busco_short_summary	File
busco_missing_buscos	File
deoverlapped_matches	File
reformatted_sequences	File
cutted_transcripts_file	File [FASTA]
cleaned_transcripts_file	File [FASTA]
busco_translated_proteins	Directory

Permalink: https://w3id.org/cwl/view/git/26dad276bac124f89086268bcbca962a5c0caca6/workflows/TranscriptsAnnotation-i5only-wf.cwl