CWL Workflow: UW GAC (GENESIS) VCF to GDS

Workflow: UW GAC (GENESIS) VCF to GDS

Fetched 2025-05-24 07:25:59 GMT

Verified with cwltool version 3.1.20221201130942

**VCF to GDS** workflow converts VCF or BCF files into Genomic Data Structure (GDS) format. GDS files are required by all workflows utilizing the GENESIS or SNPRelate R packages. _Filename requirements_: The input file names should follow the pattern <A>chr<X>.<y> For example: 1KG_phase3_subset_chr1.vcf.gz Some of the tools inside the workflow infer the chromosome number from the file by expecting this pattern of file name.

Selected
|
Default Values
Nested Workflows
Tools
Inputs/Outputs

This workflow is Open Source and may be reused according to the terms of: Apache License 2.0

Note that the tools invoked by the workflow may have separate licenses.

Inputs

ID	Type	Title	Doc
cpu	Integer (Optional)	Number of CPUs	Number of CPUs for each tool job.
format	String[] (Optional)	Format	Format fields to keep in GDS file. Default: GT
memory_gb	Float (Optional)	memory GB	Memory to allocate per job. For low number of samples (up to 10k), default 1GB is usually enough. For larger number of samples, value should be set higher (50k samples ~ 4GB). Default: 1
vcf_files	File[]	Variants Files	Input Variants Files.

Steps

ID	Runs	Label	Doc
vcf2gds	vcf2gds.cwl (CommandLineTool)	vcf2gds	Convert VCF to GDS. Output file name is <input filename>.gds
check_gds	check_gds.cwl (CommandLineTool)	check_gds
sniff_filename	splitfilename.cwl (CommandLineTool)	Split Filename
unique_variant_id	unique_variant_id.cwl (CommandLineTool)	unique_variant_id	Ensures that each variant has a unique integer ID across the genome, so the variant.id field in per-chromosome files and combined files are consistent. Expects From: https://github.com/UW-GAC/analysis_pipeline

Outputs

ID	Type	Label	Doc
check_logs	File[]
unique_variant_id_gds_per_chr	File[] (Optional)	Unique variant ID corrected GDS files per chromosome	Corrected GDS files per chromosome.