Workflow: PrediXcan

Fetched 2024-11-28 04:36:49 GMT

Predict.py has been wrapped in cwl, getting the information from: https://github.com/hakyimlab/MetaXcan/wiki/Individual-level-PrediXcan:-introduction,-tutorials-and-manual Here is a snippet from: https://github.com/hakyimlab/MetaXcan/wiki/Individual-level-PrediXcan:-introduction,-tutorials-and-manual In the following, we focus on the individual-level implementation of PrediXcan. The method was originally implemented in this repository. PrediXcan consists of two steps: Predict gene expression (or whatever biology the models predict) in a cohort with available genotypes Run associations to a trait measured in the cohort The first step is implemented in Predict.py. The prediction models are trained and pre-compiled on specific data sets with their own human genome releases and variant definitions. We implemented a few rules to support variant matching from genotypes based on different variant definitions. In the following, mapping refers to the process of assigning a model variant to a genotype variant. Originally, PrediXcan was applied to genes so we say \"gene expression\" a lot as it was the mechanism we initially studied. But conceptually, everything said here applies to any intermediate/molecular mechanism such as splicing or brain morphology. Whenever we say \"gene\", it generally could mean a splicing intron event, etc.

children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
vcf_mode String (Optional)

-\"genotyped\" is meant for phased, genotyped vcfs that contain counts of each allele at each chromosome pair. -\"imputed\" will load DS field as dosage. This is meant to work with imputed vcfs as generated by the Michigan Imputation Server.

covariates String

Please type in the column names of any additional covariates you would like to account for. Please input covariates exactly as they appear in the phenotype file with quotations around each input and separate by a comma, no spaces. ex) \"sex\",\"age\",\"PC1\"

model_db_path File (Optional)

Path to a SQlite file containing prediction models.

output_prefix String

[REQUIRED] File name prefix for output files.

vcf_genotypes File[] (Optional)

Pattern of vcf genotype files.

kinship_matrix File (Optional)

A text delimited file with a .txt file extension or an R data file with a .RData file extension containing a matrix of size M × M, where rows and columns are the sample/subject IDs

phenotype_file File (Optional)

[REQUIRED] A text delimited file with a .txt file extension containing a matrix of size M + 1 × C + 1, where M >= N and is the number of samples for which covariate data is provided.

model_db_snp_key String (Optional)

Optional. If provided, will load variant ids from an alternative column in the db. By default, PrediXcan uses rsids, and this works with Elastic Net models. For the more sophisticated MASHR models, --model_db_snp_key varID must be specified with this argument.

prediction_output String

Specify output (and output type) of predicted expression matrix

on_the_fly_mapping String (Optional)

Optional. Specify a pattern to build a variant id from genotype variant properties. e.g. --on_the_fly_mapping METADATA \"chr{}_{}_{}_{}_b38\" will take the genotype variant's chromosome, position, alleles to build a variant id like chr1_123_A_G_b38. This will use the genotype properties, or if liftover is specified, the lifted coordinates.

prediction_summary_output String

A separate file that will contain some additional information on the predictions (such as number of snps in the gene's models, number of snps used, etc).

main_phenotype_of_interest String

[REQUIRED] A string value defining the column name of the phenotype of interest. Should be a dichotomous or continuous variable. Please enter in exactly as it appears in phenotype file not surrounded by quotations. ex) main_interest

If dichotomous, make sure that in the file the main phenotype of interest is coded as a categorical variable where 0 is absence and 1 is presence of phenotype of interest. 0 will then be the reference and the output will reflect this.

Steps

There are no steps in this workflow

Outputs

ID Type Label Doc
summary File (Optional)
Association_output File
Permalink: https://w3id.org/cwl/view/git/5b49ef07b994963d190f4f508bc08e4bec8b8a0b/predixcan/predixcan_unpack.cwl