Workflow: Transcripts annotation workflow

Fetched 2021-10-27 03:01:07 GMT
children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
i5Databases Directory
singleBestOnly Boolean (Optional)
replace https://w3id.org/cwl/view/git/b1e88a8c2f6f07d236193d3e89dc2d724700780a/utils/esl-reformat-replace.yaml#replace (Optional)
i5OutputFormat
buscoOutputName String
blockSize Float (Optional)
i5Applications String[] (Optional)
buscoMode
diamondSeqdb File
clanInfoFile File
cmsearchCores Integer
transcriptsFile File [FASTA]
covariance_models File[]
buscoLineage Directory
phmmerSeqdb File [FASTA]

Steps

ID Runs Label Doc
identify_nc_rna Identifies non-coding RNAs using Rfams covariance models
calculate_diamond_matches
../tools/Diamond/Diamon.blastx-v0.9.21.cwl (CommandLineTool)
Aligns DNA query sequences against a protein reference database

DIAMOND is a sequence aligner for protein and translated DNA searches, designed for high performance analysis of big sequence data.

The key features are: + Pairwise alignment of proteins and translated DNA at 500x-20,000x speed of BLAST. + Frameshift alignments for long read analysis. + Low resource requirements and suitable for running on standard desktops or laptops. + Various output formats, including BLAST pairwise, tabular and XML, as well as taxonomic classification.

Please visit https://github.com/bbuchfink/diamond for full documentation.

Releases can be downloaded from https://github.com/bbuchfink/diamond/releases

remove_asterisks_and_reformat
../utils/esl-reformat.cwl (CommandLineTool)
Normalizes input sequences to FASTA using esl-reformat

Normalizes input sequences to FASTA with fixed number of sequence characters per line using esl-reformat from https://github.com/EddyRivasLab/easel

identify_coding_regions TransDecoder 2 step workflow, running TransDecoder.LongOrfs (step 1) followed by TransDecoder.Predict (step2)
run_transcriptome_assessment
../tools/BUSCO/BUSCO-v3.cwl (CommandLineTool)
Assesses genome assembly and annotation completeness with single-copy orthologs

BUSCO v3 provides quantitative measures for the assessment of genome assembly, gene set, and transcriptome completeness, based on evolutionarily-informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB v9. BUSCO assessments are implemented in open-source software, with a large selection of lineage-specific sets of Benchmarking Universal Single-Copy Orthologs. These conserved orthologs are ideal candidates for large-scale phylogenomics studies, and the annotated BUSCO gene models built during genome assessments provide a comprehensive gene predictor training set for use as part of genome annotation pipelines. Please visit http://busco.ezlab.org/ for full documentation. The BUSCO assessment software distribution is available from the public GitLab project: https://gitlab.com/ezlab/busco where it can be downloaded or cloned using a git client (git clone https://gitlab.com/ezlab/busco.git). We encourage users to opt for the git client option in order to facilitate future updates. BUSCO is written for Python 3.x and Python 2.7+. It runs with the standard packages. We recommend using Python3 when available.

calculate_phmmer_matches
../tools/HMMER/phmmer-v3.2.cwl (CommandLineTool)
Search a single protein sequence against a protein sequence database. (BLASTP-like)

The phmmer and jackhmmer programs search a single protein sequence against a protein sequence database, akin to BLASTP and PSIBLAST, respectively. (Internally, they just produce a profile HMM from the query sequence, then run HMM searches.) Please visit https://github.com/EddyRivasLab/hmmer for full documentation. Releases can be downloaded from https://github.com/EddyRivasLab/hmmer/releases

functional_analysis Runs InterProScan on batches of sequences to retrieve functional annotations.

Outputs

ID Type Label Doc
coding_regions File
busco_hmmer_output Directory
deoverlapped_matches File
busco_translated_proteins Directory
i5Annotations File
busco_blast_output Directory
bed_output File
busco_missing_buscos File
peptide_sequences File
reformatted_sequences File
busco_full_table File
busco_short_summary File
gff3_output File
phmmer_matches File
diamond_matches File
Permalink: https://w3id.org/cwl/view/git/b1e88a8c2f6f07d236193d3e89dc2d724700780a/workflows/TranscriptsAnnotation-wf.cwl