CWL Workflow: PrediXcan

Workflow: PrediXcan

Fetched 2025-10-29 06:50:48 GMT

Verified with cwltool version 3.1.20221201130942

Predict.py has been wrapped in cwl, getting the information from: https://github.com/hakyimlab/MetaXcan/wiki/Individual-level-PrediXcan:-introduction,-tutorials-and-manual Here is a snippet from: https://github.com/hakyimlab/MetaXcan/wiki/Individual-level-PrediXcan:-introduction,-tutorials-and-manual In the following, we focus on the individual-level implementation of PrediXcan. The method was originally implemented in this repository. PrediXcan consists of two steps: Predict gene expression (or whatever biology the models predict) in a cohort with available genotypes Run associations to a trait measured in the cohort The first step is implemented in Predict.py. The prediction models are trained and pre-compiled on specific data sets with their own human genome releases and variant definitions. We implemented a few rules to support variant matching from genotypes based on different variant definitions. In the following, mapping refers to the process of assigning a model variant to a genotype variant. Originally, PrediXcan was applied to genes so we say \"gene expression\" a lot as it was the mechanism we initially studied. But conceptually, everything said here applies to any intermediate/molecular mechanism such as splicing or brain morphology. Whenever we say \"gene\", it generally could mean a splicing intron event, etc.

Selected
|
Default Values
Nested Workflows
Tools
Inputs/Outputs

Unknown workflow license, check source repository.

Inputs

ID	Type	Doc
vcf_mode	String (Optional)	-\"genotyped\" is meant for phased, genotyped vcfs that contain counts of each allele at each chromosome pair. -\"imputed\" will load DS field as dosage. This is meant to work with imputed vcfs as generated by the Michigan Imputation Server.
covariates	String	Please type in the column names of any additional covariates you would like to account for. Please input covariates exactly as they appear in the phenotype file with quotations around each input and separate by a comma, no spaces. ex) \"sex\",\"age\",\"PC1\"
model_db_path	File (Optional)	Path to a SQlite file containing prediction models.
output_prefix	String	[REQUIRED] File name prefix for output files.
vcf_genotypes	File[] (Optional)	Pattern of vcf genotype files.
kinship_matrix	File (Optional)	A text delimited file with a .txt file extension or an R data file with a .RData file extension containing a matrix of size M × M, where rows and columns are the sample/subject IDs
phenotype_file	File (Optional)	[REQUIRED] A text delimited file with a .txt file extension containing a matrix of size M + 1 × C + 1, where M >= N and is the number of samples for which covariate data is provided.
model_db_snp_key	String (Optional)	Optional. If provided, will load variant ids from an alternative column in the db. By default, PrediXcan uses rsids, and this works with Elastic Net models. For the more sophisticated MASHR models, --model_db_snp_key varID must be specified with this argument.
prediction_output	String	Specify output (and output type) of predicted expression matrix
on_the_fly_mapping	String (Optional)	Optional. Specify a pattern to build a variant id from genotype variant properties. e.g. --on_the_fly_mapping METADATA \"chr{}_{}_{}_{}_b38\" will take the genotype variant's chromosome, position, alleles to build a variant id like chr1_123_A_G_b38. This will use the genotype properties, or if liftover is specified, the lifted coordinates.
prediction_summary_output	String	A separate file that will contain some additional information on the predictions (such as number of snps in the gene's models, number of snps used, etc).
main_phenotype_of_interest	String	[REQUIRED] A string value defining the column name of the phenotype of interest. Should be a dichotomous or continuous variable. Please enter in exactly as it appears in phenotype file not surrounded by quotations. ex) main_interest If dichotomous, make sure that in the file the main phenotype of interest is coded as a categorical variable where 0 is absence and 1 is presence of phenotype of interest. 0 will then be the reference and the output will reflect this.

Steps

There are no steps in this workflow

Outputs

ID	Type	Label	Doc
summary	File (Optional)
Association_output	File

Permalink: https://w3id.org/cwl/view/git/5b49ef07b994963d190f4f508bc08e4bec8b8a0b/predixcan/predixcan_unpack.cwl