Workflow: protein annotation
Proteins - predict, cluster, identify, annotate
- Selected
- |
- Default Values
- Nested Workflows
- Tools
- Inputs/Outputs
Inputs
| ID | Type | Title | Doc |
|---|---|---|---|
| jobid | String | ||
| m5nrBDB | File | ||
| m5nrSCG | File | ||
| m5nrFull | File[] | ||
| sequences | File | ||
| protIdentity | Float (Optional) |
Steps
| ID | Runs | Label | Doc |
|---|---|---|---|
| catSims |
../Tools/cat.tool.cwl
(CommandLineTool)
|
GNU cat |
Concatenate FILE(s) to standard output |
| sortSims |
../Tools/sort.tool.cwl
(CommandLineTool)
|
GNU sort |
sort text file base on given field(s) |
| superblat |
../Tools/superblat.tool.cwl
(CommandLineTool)
|
superBLAT |
multi-threaded fast sequence search command line tool, protein only >superblat -fastMap -prot -out blast8 <database> <query> <output> |
| bleachSims |
../Tools/bleachsims.tool.cwl
(CommandLineTool)
|
bleachsims |
filter similarity file by E-value and number of hits >bleachsims -s <input> -o <output> -m 20 -r 0 -c 3 |
| protCluster |
../Tools/cdhit.tool.cwl
(CommandLineTool)
|
CD-HIT |
cluster protein sequences use max available cpus and memory >cdhit -n 5 -d 0 -T 0 -M 0 -c 0.9 -i <input> -o <output> |
| protFeature |
../Tools/fraggenescan.tool.cwl
(CommandLineTool)
|
FragGeneScan |
hidden Markov model for predicting prokaryotic coding regions >run_FragGeneScan.pl --genome <input> --out <output> --complete 0 --train 454_30 |
| annotateSims |
../Tools/sims_annotate.tool.cwl
(CommandLineTool)
|
annotate sims |
create expanded annotated sims files from input md5 sim file and m5nr db sims_annotate.pl --verbose --in_sim <input> --in_scg <scgs> --ann_file <database> --format <seqFormat> --out_filter <outFilter> --out_expand <outExpand> -out_lca <outLca> --frag_num 5000 |
| formatCluster |
../Tools/format_cluster.tool.cwl
(CommandLineTool)
|
cluster file reformat |
re-formats cd-hit .clstr file into mg-rast .mapping file >format_cluster.pl --input <input> --output <output> |
Outputs
| ID | Type | Label | Doc |
|---|---|---|---|
| protLCAOut | File | ||
| protSimsOut | File | ||
| protExpandOut | File | ||
| protFilterOut | File | ||
| protFeatureOut | File | ||
| protClustMapOut | File | ||
| protClustSeqOut | File |
https://w3id.org/cwl/view/git/6a8727124baf77416ca797982fd4e0689c2a593a/CWL/Workflows/protein-annotation.workflow.cwl
