Workflow: Produce a list of residue-mapped structural domain instances from CATH ids

Retrieve and process the PDB structures corresponding to the CATH superfamily ids resulting in a list of residue-mapped structural domain instances along with lost structural instances (requires Data/cath_domain_description_file.txt downloaded from CATH and uses SIFTS resource for PDB to UniProt residue Mapping)

ID Type Title Doc
siftsdir Directory Directory for storing all SIFTS files
lost_merged String Filename for Pfam inconsistent domain StIs
min_dom_size Integer Threshold for minimum domain length
family_idsfile File [JSON] File with the family IDs per iteration
resmapped_file String Filename for CATH inconsistent domain StIs


ID Runs Label Doc
add_domain_num.cwl (CommandLineTool)
Add domain position labels to residue-mapped instances

The tool adds domain position labels to each structural instance within the protein in respect with the given list.

gather_lost_resmap.cwl (CommandLineTool)
Changes the format for core structural instances (only 1st iteration)

The tool reads the given family IDs from parameter file (.yml) and writes it to a separate file according to each iteration.

separate_cath.cwl (CommandLineTool)
Filter all structural instances for given CATH superfamilies

The tool filter raw files from CATH to retrieve all the available structural instances from the given CATH superfamilies. cwl-runner --cachedir=tmp_files/ --outdir=Results/ Workflow/separate_structures.cwl yml/separate_structures.yml

resmapping_cath_structs Mapping of residue numbering from PDB to UniProt


ID Type Label Doc
cath_domain_posi_file File [CSV] All residue-mapped domain StIs with domain labels
cath_total_lost_structures File [JSON] Obsolete and inconsistent domain StIs together