Workflow: UW GAC (GENESIS) VCF to GDS

Fetched 2025-05-24 07:25:59 GMT

**VCF to GDS** workflow converts VCF or BCF files into Genomic Data Structure (GDS) format. GDS files are required by all workflows utilizing the GENESIS or SNPRelate R packages. _Filename requirements_: The input file names should follow the pattern <A>chr<X>.<y> For example: 1KG_phase3_subset_chr1.vcf.gz Some of the tools inside the workflow infer the chromosome number from the file by expecting this pattern of file name.

children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
cpu Integer (Optional) Number of CPUs

Number of CPUs for each tool job.

format String[] (Optional) Format

Format fields to keep in GDS file. Default: GT

memory_gb Float (Optional) memory GB

Memory to allocate per job. For low number of samples (up to 10k), default 1GB is usually enough. For larger number of samples, value should be set higher (50k samples ~ 4GB). Default: 1

vcf_files File[] Variants Files

Input Variants Files.

Steps

ID Runs Label Doc
vcf2gds
vcf2gds.cwl (CommandLineTool)
vcf2gds

Convert VCF to GDS. Output file name is <input filename>.gds

check_gds
check_gds.cwl (CommandLineTool)
check_gds
sniff_filename
splitfilename.cwl (CommandLineTool)
Split Filename
unique_variant_id
unique_variant_id.cwl (CommandLineTool)
unique_variant_id

Ensures that each variant has a unique integer ID across the genome, so the variant.id field in per-chromosome files and combined files are consistent.

Expects

From: https://github.com/UW-GAC/analysis_pipeline

Outputs

ID Type Label Doc
check_logs File[]
unique_variant_id_gds_per_chr File[] (Optional) Unique variant ID corrected GDS files per chromosome

Corrected GDS files per chromosome.

Permalink: https://w3id.org/cwl/view/git/5f17ca875ec5b0e324fa899ed0e3175ef9ddf9d0/vcftogds/vcf-to-gds-wf.cwl