A fast and powerful W-test for pairwise epistasis testing
Thus, an epistasis test can only be performed on two mutations in different loci and with opposite phenotypes; epistatic relationships cannot be determined using. In the standard approach, SNPs are tested one by one for statistical model that depicts the relationship between a linear combination of. Epistasis is the phenomenon where the effect of one gene (locus) is dependent on the advent of molecular biology, epistasis started to be studied in relation to Quantitative Trait Loci (QTL) and polygenic inheritance. .. In addition, in those tests which used artificial gene networks, negative epistasis is only found in more .
Moreover, epistasis is sometimes investigated in the context of epistatic variance: The epistatic variance depends not only on the genetic model for the action of two or more loci, but also on population parameters such as multilocus genotype frequencies 2223in the same way that additive and dominance variances at a single locus depend not just on the model of dominance assumed but also on population genotype or allele frequencies.
Confusions of definition and terminology apart, the main problem with the interpretation of epistasis is that the word itself suggests that we are dealing with a biologically interesting phenomenon. If epistasis is detected, the assumption is that this tells us something of interest about the mechanisms and pathways involved in disease—in particular in relation to the biological interaction between implicated proteins. Indeed, the description in 5 hints strongly for a biological or causal interpretation of the models there defined.
However, statistical tests of interaction are limited to testing specific hypotheses concerning precisely defined quantities. Unfortunately, as we have seen, there is not a precise correspondence between biological models of epistasis and those that are more statistically motivated. We should like to perform a statistical test and interpret the outcome biologically, but this is in general not permissible.
Statistical interaction does not necessarily imply interaction on the biological or mechanistic level A brief survey of the epidemiological literature reveals the major difficulties that exist in inferring biological meaning from quantitative data measuring disease risk as outcome 325 The problem is that any given data pattern and statistical model can usually be obtained from a number of completely different underlying mechanisms or models for disease development 326 For instance, five very different causal mechanisms can be shown to all lead to a multiplicative model for the data used in investigating the joint effects of two risk factors Only if the prior biological model can be postulated in some detail is it likely that statistical modelling of this kind will allow insight into the underlying biological mechanisms.
Although the discovery of epistasis may be of limited value for elucidating the underlying biological disease process, allowing for different modes of interaction between potential disease loci can lead to improved power for detection of genetic effects. Simulation studies 122829 suggest that this improvement in power may be relatively modest. Nevertheless, in analysis of real data for type 1 diabetes 1228type 2 diabetes 30 and inflammatory bowel disease 31increased evidence for linkage at one locus was seen when the interaction with another locus was taken into account.
Methods for the detection of epistasis vary according to whether one is performing association or linkage analysis, and according to whether one is dealing with a quantitative or a qualitative in particular a dichotomous trait. For genetic association studies, standard methods for epidemiological studies may be employed, with genotypes at the various loci considered as risk factors for disease. This provides an overall 4 degree-of-freedom df test for interaction, but the interaction terms could each be tested individually on 1 df by removal from the first model, if required.
Note that this procedure implicitly assumes that the log odds scale is the scale of interest: Quantitative traits can be analysed in a similar way by use of standard multiple linear as opposed to logistic regression: Note that these regression procedures are actually designed for testing epistasis between loci that have been genotyped.
If it is believed that these loci are not themselves the etiological variants but rather are in linkage disequilibrium LD with the true disease-causing variants, then epistasis between the surrogate genotyped loci is likely to be diluted compared with epistasis between the true variants, although the extent to which this occurs will depend on the magnitude of the LD.
A related method for analysis of nuclear family data involves a generalization of the genotype relative risk approach proposed by Schaid Conditional logistic regression is used to fit models for the genotype relative risks.
This method can be extended to fit models for genotype relative risks at two unlinked loci by generating not three but fifteen matched pseudocontrols for each case, where the genotype at the two loci for each pseudocontrol consists of one of the two-locus genotypes that could have been, but was not, transmitted to the case. Two-locus models for the genotype relative risks at the two loci are fitted using conditional logistic regression.
A survey about methods dedicated to epistasis detection
Standard statistical software can be used to fit models that involve departure from multiplicativity in the penetrances and hence in the genotype relative risks ; more specialist software or user programming will be required for detecting epistasis defined as departure from additivity in the penetrances. A variety of related approaches that focus on the issue of association testing but can be used to detect or allow for epistasis in family-based analysis of quantitative traits have also been proposed 34 — Epistasis is relatively easily incorporated into standard non-parametric model-free methods of linkage analysis for quantitative traits.
One popular method is the variance components method, in which the phenotypic covariance between relatives is modelled in terms of variance component parameters and underlying identity-by-descent IBD sharing probabilities at one or more genetic loci, assuming underlying multivariate normality of the trait within pedigrees. Models that include epistatic in the sense of departure from additive components of variance may be fitted and compared with models that do not contain these components using maximum-likelihood methods implemented in such programs as SOLAR Another popular method of linkage analysis for quantitative traits is the Haseman—Elston method 39 and extensions When a mutation has a large number of epistatic effects, each accumulated mutation drastically changes the set of available beneficial mutations.
Therefore, the evolutionary trajectory followed depends highly on which early mutations were accepted. Thus, repeats of evolution from the same starting point tend to diverge to different local maxima rather than converge on a single global maximum as they would in a smooth, additive landscape. Experimentally, this idea has been tested in using digital simulations of asexual and sexual populations.
Over time, sexual populations move towards more negative epistasis, or the lowering of fitness by two interacting alleles. It is thought that negative epistasis allows individuals carrying the interacting deleterious mutations to be removed from the populations efficiently.
This removes those alleles from the population, resulting in an overall more fit population. This hypothesis was proposed by Alexey Kondrashovand is sometimes known as the deterministic mutation hypothesis  and has also been tested using artificial gene networks. Any two locus interactions at a particular gene frequency can be decomposed into eight independent genetic effects using a weighted regression.
In this regression, the observed two locus genetic effects are treated as dependent variables and the "pure" genetic effects are used as the independent variables.
Because the regression is weighted, the partitioning among the variance components will change as a function of gene frequency.
By analogy it is possible to expand this system to three or more loci, or to cytonuclear interactions  Double mutant cycles[ edit ] When assaying epistasis within a gene, site-directed mutagenesis can be used to generate the different genes, and their protein products can be assayed e.
This is sometimes called a double mutant cycle and involves producing and assaying the wild type protein, the two single mutants and the double mutant. Epistasis is measured as the difference between the effects of the mutations together versus the sum of their individual effects. The same methodology can be used to investigate the interactions between larger sets of mutations but all combinations have to be produced and assayed. For example, there are different combinations of 5 mutations, some or all of which may show epistasis Statistical coupling analysis[ edit ] You can help by adding to it.
May Computational prediction[ edit ] Numerous computational methods have been developed for the detection and characterization of epistasis.
A survey about methods dedicated to epistasis detection
In principle, the W-test takes the form of Chi-squared distribution, and its degrees of freedom are estimated from the covariance structure of a contingency table formed by the interaction set. The data-dependent degrees of freedom allow the method to cope with low frequency genotypes, which, for classic tests, will result in low power from imperfect statistical distributions. The W-test showed robust power and reasonable type I error in various genetic environments; when the variants frequency is low, it outperforms all alternative methods.
The remainder of the article is organized as follows.
In the next section, we describe the proposed method, including its formulation and distribution. We then will test the power and type I error of the proposed methods and alternative methods under different genetic models and genetic architectures, using simulated phenotype generated from real data.
- There was a problem providing the content you requested
We identified a number of genes that are highly relevant to neuronal function and depressive disorders, which can be replicated by the two datasets. To our knowledge, this is also the first report of successful replication of the genes with significant epistasis effect in GWAS.
The method proposed also has general application values for identifying disease-susceptible interactions in other types of data. Under a co-dominant model, the genotype data X can be coded by minor allele count to take values 0, 1, 2. The phenotype Y is binary for the case and control dataset. Let k denote the number of columns of the table. The cell distribution of X1, X2 in the case and control group can be written as: The method can also accommodate main effect testing.
For both case and control samples, we have: