Workflow: UW GAC (GENESIS) VCF to GDS
**VCF to GDS** workflow converts VCF or BCF files into Genomic Data Structure (GDS) format. GDS files are required by all workflows utilizing the GENESIS or SNPRelate R packages. _Filename requirements_: The input file names should follow the pattern <A>chr<X>.<y> For example: 1KG_phase3_subset_chr1.vcf.gz Some of the tools inside the workflow infer the chromosome number from the file by expecting this pattern of file name.
- Selected
- |
- Default Values
- Nested Workflows
- Tools
- Inputs/Outputs
Inputs
ID | Type | Title | Doc |
---|---|---|---|
cpu | Integer (Optional) | Number of CPUs |
Number of CPUs for each tool job. |
format | String[] (Optional) | Format |
Format fields to keep in GDS file. Default: GT |
memory_gb | Float (Optional) | memory GB |
Memory to allocate per job. For low number of samples (up to 10k), default 1GB is usually enough. For larger number of samples, value should be set higher (50k samples ~ 4GB). Default: 1 |
vcf_files | File[] | Variants Files |
Input Variants Files. |
Steps
ID | Runs | Label | Doc |
---|---|---|---|
vcf2gds |
vcf2gds.cwl
(CommandLineTool)
|
vcf2gds |
Convert VCF to GDS. Output file name is <input filename>.gds |
check_gds |
check_gds.cwl
(CommandLineTool)
|
check_gds | |
sniff_filename |
splitfilename.cwl
(CommandLineTool)
|
Split Filename | |
unique_variant_id |
unique_variant_id.cwl
(CommandLineTool)
|
unique_variant_id |
Ensures that each variant has a unique integer ID across the genome, so the
variant.id field in per-chromosome files and combined files are consistent. |
Outputs
ID | Type | Label | Doc |
---|---|---|---|
check_logs | File[] | ||
unique_variant_id_gds_per_chr | File[] (Optional) | Unique variant ID corrected GDS files per chromosome |
Corrected GDS files per chromosome. |
https://w3id.org/cwl/view/git/5f17ca875ec5b0e324fa899ed0e3175ef9ddf9d0/vcftogds/vcf-to-gds-wf.cwl