Workflow: protein annotation

Fetched 2023-01-14 16:33:22 GMT

Proteins - predict, cluster, identify, annotate

children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
jobid String
m5nrBDB File
m5nrSCG File
m5nrFull File[]
sequences File
protIdentity Float (Optional)

Steps

ID Runs Label Doc
catSims
../Tools/cat.tool.cwl (CommandLineTool)
GNU cat

Concatenate FILE(s) to standard output

sortSims
../Tools/sort.tool.cwl (CommandLineTool)
GNU sort

sort text file base on given field(s)

superblat
../Tools/superblat.tool.cwl (CommandLineTool)
superBLAT

multi-threaded fast sequence search command line tool, protein only >superblat -fastMap -prot -out blast8 <database> <query> <output>

bleachSims
../Tools/bleachsims.tool.cwl (CommandLineTool)
bleachsims

filter similarity file by E-value and number of hits >bleachsims -s <input> -o <output> -m 20 -r 0 -c 3

protCluster
../Tools/cdhit.tool.cwl (CommandLineTool)
CD-HIT

cluster protein sequences use max available cpus and memory >cdhit -n 5 -d 0 -T 0 -M 0 -c 0.9 -i <input> -o <output>

protFeature
../Tools/fraggenescan.tool.cwl (CommandLineTool)
FragGeneScan

hidden Markov model for predicting prokaryotic coding regions >run_FragGeneScan.pl --genome <input> --out <output> --complete 0 --train 454_30

annotateSims
../Tools/sims_annotate.tool.cwl (CommandLineTool)
annotate sims

create expanded annotated sims files from input md5 sim file and m5nr db sims_annotate.pl --verbose --in_sim <input> --in_scg <scgs> --ann_file <database> --format <seqFormat> --out_filter <outFilter> --out_expand <outExpand> -out_lca <outLca> --frag_num 5000

formatCluster
../Tools/format_cluster.tool.cwl (CommandLineTool)
cluster file reformat

re-formats cd-hit .clstr file into mg-rast .mapping file >format_cluster.pl --input <input> --output <output>

Outputs

ID Type Label Doc
protLCAOut File
protSimsOut File
protExpandOut File
protFilterOut File
protFeatureOut File
protClustMapOut File
protClustSeqOut File
Permalink: https://w3id.org/cwl/view/git/7b1df2ecce5a8727f2c546c5baa45c919edd8a76/CWL/Workflows/protein-annotation.workflow.cwl