Workflow: plant2human workflow

Fetched 2024-11-08 11:41:56 GMT

Novel gene discovery workflow by comparing plant species and model organisms with humans based on structural similarity search.

children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
EVALUE Double e-value (foldseek easy-search)
THREADS Integer threads (foldseek easy-search)
ROUTE_DATASET String route dataset (togoid convert)
FOLDSEEK_INDEX File foldseek index file
INPUT_DIRECTORY Directory

query protein structure cif file directory

TAXONOMY_ID_LIST String taxonomy id list (foldseek easy-search)

taxonomy id list. separated by comma. Be sure to set “9606”.

OUTPUT_FILE_NAME1 String output file name (foldseek easy-search)
OUTPUT_FILE_NAME2 String
OUTPUT_FILE_NAME3 String output file name (togoid convert)
OUT_NOTEBOOK_NAME String output notebook name (papermill)
FILE_MATCH_PATTERN String

file match pattern for listing

SPLIT_MEMORY_LIMIT String split memory limit (foldseek easy-search)
QUERY_GENE_LIST_TSV File [TSV] query gene list tsv (papermill)
QUERY_IDMAPPING_TSV File [TSV] query idmapping tsv (papermill)
OUTPUT_FILE_NAME_HIT_SPECIES String
WF_COLUMN_NUMBER_HIT_SPECIES Integer column number of hit species
OUTPUT_FILE_NAME_QUERY_SPECIES String
WF_COLUMN_NUMBER_QUERY_SPECIES Integer column number of query species
SW_INPUT_FASTA_FILE_HIT_SPECIES File [FASTA search results format] input fasta file (for blastdbcmd)

input fasta file

SW_INPUT_FASTA_FILE_QUERY_SPECIES File [FASTA search results format] input fasta file (for blastdbcmd)

input fasta file

Steps

ID Runs Label Doc
papermill
../Tools/19_papermill.cwl (CommandLineTool)
list_files
../Tools/10_listing.cwl (CommandLineTool)

\" List files in a directory for foldseek easy-search process. e.g. ../Data/rice_up_mmCIFfile/*.cif Reference: https://qiita.com/kyusque/items/a291fd251a10f783390e#3-glob%E7%94%A8%E3%81%AEclt%E3%82%92%E4%BD%BF%E3%81%86 \"

togoid_convert
../Tools/18_togoid_convert.cwl (CommandLineTool)
foldseek_easy_search
../Tools/11_foldseek_easy_search.cwl (CommandLineTool)
extract_target_species
../Tools/12_extract_target_species.cwl (CommandLineTool)
extract_hit_species_column
../Tools/13_extract_id.cwl (CommandLineTool)

awk -> sort -> uniq -> redirect to uniprot_id.txt

extract_query_species_column
../Tools/13_extract_id.cwl (CommandLineTool)

awk -> sort -> uniq -> redirect to uniprot_id.txt

sub_workflow_retrieve_sequence_query_species foldseek easy-search sub-workflow

retrieve sequence from blastdbcmd result makeblastdb: ../Tools/14_makeblastdb.cwl blastdbcmd: ../Tools/15_blastdbcmd.cwl seqretsplit: ../Tools/16_seqretsplit.cwl needle (Global alignment): ../Tools/17_needle.cwl water (Local alignment): ../Tools/17_water.cwl

Outputs

ID Type Label Doc
dir1 Directory directory (seqretsplit query species)
dir2 Directory directory (seqretsplit hit species)
dir3 Directory needle result directory
dir4 Directory water result directory
water File[] water result file (.water)
needle File[] needle result file (.needle)
idlist1 File output file (extract query species column)
idlist2 File [TSV] output file (extract hit species column)
logfile1 File logfile (blastdbcmd query species)
logfile2 File logfile (blastdbcmd hit species)
tsvfile1 File [TSV] output file (foldseek easy-search)
tsvfile2 File [TSV] output file (extract target species)
tsvfile3 File [TSV] output file (togoid convert)
index_dir1 Directory index directory (query species)
index_dir2 Directory index directory (hit species)
fasta_files1 File[] split fasta files (seqretsplit query species)
fasta_files2 File[] split fasta files (seqretsplit hit species)
index_files1 File index file (query species)
index_files2 File index file (hit species)
report_notebook File output notebook (papermill)
blastdbcmd_result1 File blastdbcmd result (query species)
blastdbcmd_result2 File blastdbcmd result (hit species)
Permalink: https://w3id.org/cwl/view/git/9bd80581d7ced3ee307b020eb4b091e411c3cbfb/Workflow/plant2human_v1_1.cwl