Workflow: plant2human workflow
Novel gene discovery workflow by comparing plant species and model organisms with humans based on structural similarity search.
- Selected
- |
- Default Values
- Nested Workflows
- Tools
- Inputs/Outputs
Inputs
ID | Type | Title | Doc |
---|---|---|---|
EVALUE | Double | e-value (foldseek easy-search) | |
THREADS | Integer | threads (foldseek easy-search) | |
ROUTE_DATASET | String | route dataset (togoid convert) | |
FOLDSEEK_INDEX | File | foldseek index file | |
INPUT_DIRECTORY | Directory |
query protein structure cif file directory |
|
TAXONOMY_ID_LIST | String | taxonomy id list (foldseek easy-search) |
taxonomy id list. separated by comma. Be sure to set “9606”. |
OUTPUT_FILE_NAME1 | String | output file name (foldseek easy-search) | |
OUTPUT_FILE_NAME2 | String | ||
OUTPUT_FILE_NAME3 | String | output file name (togoid convert) | |
OUT_NOTEBOOK_NAME | String | output notebook name (papermill) | |
FILE_MATCH_PATTERN | String |
file match pattern for listing |
|
SPLIT_MEMORY_LIMIT | String | split memory limit (foldseek easy-search) | |
QUERY_GENE_LIST_TSV | File [TSV] | query gene list tsv (papermill) | |
QUERY_IDMAPPING_TSV | File [TSV] | query idmapping tsv (papermill) | |
OUTPUT_FILE_NAME_HIT_SPECIES | String | ||
WF_COLUMN_NUMBER_HIT_SPECIES | Integer | column number of hit species | |
OUTPUT_FILE_NAME_QUERY_SPECIES | String | ||
WF_COLUMN_NUMBER_QUERY_SPECIES | Integer | column number of query species | |
SW_INPUT_FASTA_FILE_HIT_SPECIES | File [FASTA search results format] | input fasta file (for blastdbcmd) |
input fasta file |
SW_INPUT_FASTA_FILE_QUERY_SPECIES | File [FASTA search results format] | input fasta file (for blastdbcmd) |
input fasta file |
Steps
ID | Runs | Label | Doc |
---|---|---|---|
papermill |
../Tools/19_papermill.cwl
(CommandLineTool)
|
||
list_files |
../Tools/10_listing.cwl
(CommandLineTool)
|
\" List files in a directory for foldseek easy-search process. e.g. ../Data/rice_up_mmCIFfile/*.cif Reference: https://qiita.com/kyusque/items/a291fd251a10f783390e#3-glob%E7%94%A8%E3%81%AEclt%E3%82%92%E4%BD%BF%E3%81%86 \" |
|
togoid_convert |
../Tools/18_togoid_convert.cwl
(CommandLineTool)
|
||
foldseek_easy_search |
../Tools/11_foldseek_easy_search.cwl
(CommandLineTool)
|
||
extract_target_species |
../Tools/12_extract_target_species.cwl
(CommandLineTool)
|
||
extract_hit_species_column |
../Tools/13_extract_id.cwl
(CommandLineTool)
|
awk -> sort -> uniq -> redirect to uniprot_id.txt |
|
extract_query_species_column |
../Tools/13_extract_id.cwl
(CommandLineTool)
|
awk -> sort -> uniq -> redirect to uniprot_id.txt |
|
sub_workflow_retrieve_sequence_query_species |
11_retrieve_sequence_wf.cwl
(Workflow)
|
foldseek easy-search sub-workflow |
retrieve sequence from blastdbcmd result makeblastdb: ../Tools/14_makeblastdb.cwl blastdbcmd: ../Tools/15_blastdbcmd.cwl seqretsplit: ../Tools/16_seqretsplit.cwl needle (Global alignment): ../Tools/17_needle.cwl water (Local alignment): ../Tools/17_water.cwl |
Outputs
ID | Type | Label | Doc |
---|---|---|---|
dir1 | Directory | directory (seqretsplit query species) | |
dir2 | Directory | directory (seqretsplit hit species) | |
dir3 | Directory | needle result directory | |
dir4 | Directory | water result directory | |
water | File[] | water result file (.water) | |
needle | File[] | needle result file (.needle) | |
idlist1 | File | output file (extract query species column) | |
idlist2 | File [TSV] | output file (extract hit species column) | |
logfile1 | File | logfile (blastdbcmd query species) | |
logfile2 | File | logfile (blastdbcmd hit species) | |
tsvfile1 | File [TSV] | output file (foldseek easy-search) | |
tsvfile2 | File [TSV] | output file (extract target species) | |
tsvfile3 | File [TSV] | output file (togoid convert) | |
index_dir1 | Directory | index directory (query species) | |
index_dir2 | Directory | index directory (hit species) | |
fasta_files1 | File[] | split fasta files (seqretsplit query species) | |
fasta_files2 | File[] | split fasta files (seqretsplit hit species) | |
index_files1 | File | index file (query species) | |
index_files2 | File | index file (hit species) | |
report_notebook | File | output notebook (papermill) | |
blastdbcmd_result1 | File | blastdbcmd result (query species) | |
blastdbcmd_result2 | File | blastdbcmd result (hit species) |
https://w3id.org/cwl/view/git/9bd80581d7ced3ee307b020eb4b091e411c3cbfb/Workflow/plant2human_v1_1.cwl