Workflow: protein annotation
Proteins - predict, filter, cluster, identify, annotate
- Selected
- |
- Default Values
- Nested Workflows
- Tools
- Inputs/Outputs
Inputs
ID | Type | Title | Doc |
---|---|---|---|
jobid | String | ||
m5nrBDB | File | ||
m5nrSCG | File | ||
rnaSims | File | ||
m5nrFull | File[] | ||
sequences | File | ||
rnaClustMap | File | ||
protIdentity | Float (Optional) |
Steps
ID | Runs | Label | Doc |
---|---|---|---|
catSims |
../Tools/cat.tool.cwl
(CommandLineTool)
|
GNU cat |
Concatenate FILE(s) to standard output |
sortProt |
../Tools/seqUtil.tool.cwl
(CommandLineTool)
|
seqUtil |
Utility tool for various sequence file transformations. |
sortSims |
../Tools/sort.tool.cwl
(CommandLineTool)
|
GNU sort |
sort text file base on given field(s) |
superblat |
../Tools/superblat.tool.cwl
(CommandLineTool)
|
superBLAT |
multi-threaded fast sequence search command line tool, protein only >superblat -fastMap -prot -out blast8 <database> <query> <output> |
bleachSims |
../Tools/bleachsims.tool.cwl
(CommandLineTool)
|
bleachsims |
filter similarity file by E-value and number of hits >bleachsims -s <input> -o <output> -m 20 -r 0 -c 3 |
protFilter |
../Tools/filter_feature.tool.cwl
(CommandLineTool)
|
filter features |
remove predicted genes that have overlap with identified rRNAs >filter_feature.pl --seq <sequences> --sim <similarity> --clust <cluster> --output <output> --overlap <overlap> --memory <memory in MB> --tmp_dir <temp directory> |
protCluster |
../Tools/cdhit.tool.cwl
(CommandLineTool)
|
CD-HIT |
cluster protein sequences use max available cpus and memory >cdhit -n 5 -d 0 -T 0 -M 0 -c 0.9 -i <input> -o <output> |
protFeature |
../Tools/fraggenescan.tool.cwl
(CommandLineTool)
|
FragGeneScan |
hidden Markov model for predicting prokaryotic coding regions >run_FragGeneScan.pl --genome <input> --out <output> --complete 0 --train 454_30 |
annotateSims |
../Tools/sims_annotate.tool.cwl
(CommandLineTool)
|
annotate sims |
create expanded annotated sims files from input md5 sim file and m5nr db sims_annotate.pl --verbose --in_sim <input> --in_scg <scgs> --ann_file <database> --format <seqFormat> --out_filter <outFilter> --out_expand <outExpand> -out_lca <outLca> --frag_num 5000 |
formatCluster |
../Tools/format_cluster.tool.cwl
(CommandLineTool)
|
cluster file reformat |
re-formats cd-hit .clstr file into mg-rast .mapping file >format_cluster.pl --input <input> --output <output> |
Outputs
ID | Type | Label | Doc |
---|---|---|---|
protLCAOut | File | ||
protSimsOut | File | ||
protExpandOut | File | ||
protFilterOut | File | ||
protFeatureOut | File | ||
protClustMapOut | File | ||
protClustSeqOut | File | ||
protFilterFeatureOut | File |
https://w3id.org/cwl/view/git/f5839797da8209a9d3e441023f88130219751020/CWL/Workflows/protein-filter-annotation.workflow.cwl