Epistasis & the Shifting Balance
What Is Epistasis?
Epistasis is the interaction between genes. More precisely, epistasis occurs when the phenotypic effect of an allele at one locus depends on the genotype at another locus. This is distinct from additive effects, where each allele contributes independently and predictably to the phenotype regardless of the genetic background.
Why does this matter? If genetic architectures are primarily additive, populations will tend to evolve smoothly toward a single optimum — a process well described by Fisher's infinitesimal model. But if epistasis is common, populations can get "trapped" on local fitness peaks. The response to selection depends critically on genetic background, and the adaptive landscape becomes rugged and multi-peaked.
Types of Epistasis
Magnitude epistasis: The interaction changes the size of an allele's effect. An allele that increases body size by 2mm in one genetic background might increase it by only 0.5mm in another. The direction is the same, but the magnitude differs.
Sign epistasis: The interaction reverses the direction of an allele's effect. An allele that is beneficial in one genetic background becomes deleterious in another. This is more consequential because it means that whether an allele is favored by selection depends on what else is in the genome.
Reciprocal sign epistasis: Both alleles at the interacting loci reverse their effects depending on the other. This is the most important type for evolutionary theory because it is the necessary and sufficient condition for the existence of multiple fitness peaks. When reciprocal sign epistasis exists, there is no single-step mutational path from one peak to another that is always uphill — the population must cross a valley.
The Papers That Launched a Field
Wright vs. Fisher
The most consequential intellectual conflict in 20th-century evolutionary genetics pitted two titans against each other: Ronald A. Fisher and Sewall Wright. Their disagreement was not about whether evolution occurs, but about how it works at the genetic level — and their differing views continue to shape the field today.
Fisher's View
Fisher argued that evolution is primarily driven by natural selection acting on additive genetic variance in large, panmictic (freely interbreeding) populations. In this framework, epistasis is statistical noise — it contributes little to the response to selection because recombination breaks up favorable gene combinations every generation. His Fundamental Theorem of Natural Selection states that the rate of increase in fitness equals the additive genetic variance in fitness. Evolution, in Fisher's view, is a smooth, deterministic, hill-climbing process. Drift is negligible in populations of realistic size, and population structure is an unnecessary complication.
Wright's View
Wright saw a fundamentally different picture. He argued that real populations are structured into small, partially isolated demes (subpopulations). In small demes, genetic drift is a powerful force that can push populations away from their current fitness peak. Gene interactions (epistasis) create multiple adaptive peaks in the fitness landscape. Wright proposed that drift, migration, and selection work together to allow populations to explore and ultimately find the highest peaks — his famous shifting balance theory.
The Three Phases of the Shifting Balance
Phase 1 — Random drift: Within small demes, genetic drift causes random fluctuations in allele frequencies. This exploration of genotype space can move a population off its current local fitness peak and into the domain of attraction of a different peak.
Phase 2 — Mass selection within demes: Once drift has carried a deme into the basin of attraction of a new (possibly higher) peak, ordinary natural selection pushes the population uphill toward that new peak.
Phase 3 — Interdeme selection: Demes sitting on higher peaks have greater average fitness, producing more emigrants. These migrants carry favorable gene combinations to neighboring demes, pulling the entire metapopulation toward the global optimum. This is a form of group selection, mediated by differential migration.
The Papers
Note: This chart shows simulated data for illustrative purposes.
Coyne, Barton, Turelli vs. Wade & Goodnight
The Wright-Fisher debate was dramatically revived in the late 1990s, when Jerry Coyne, Nick Barton, and Michael Turelli published a provocative critique arguing that the shifting balance theory was unnecessary and unsupported by evidence. The response from Michael Wade and Charles Goodnight was equally forceful. What followed was one of the great intellectual exchanges in evolutionary biology.
The Critique (Coyne, Barton, Turelli 1997)
They attacked each phase of the shifting balance:
Against Phase 1: Random drift is too weak to move populations off fitness peaks in populations of realistic size. The valley-crossing required is implausible unless populations are extremely small.
Against Phase 3: Interdeme selection (the mechanism by which higher-fitness demes spread their gene combinations to lower-fitness demes) has never been convincingly demonstrated in nature. The required conditions are very specific and unlikely to be commonly met.
The bottom line: Simpler models based on mass selection acting on additive variation in large populations can explain observed patterns of adaptation. The shifting balance is "of only minor importance in evolution."
The Defense (Wade & Goodnight 1998)
Wade and Goodnight argued that Coyne et al. had set up straw man versions of Wright's theory. Their key points:
Tribolium experiments: Laboratory experiments with flour beetles demonstrated that population structure and epistasis interact to produce evolutionary outcomes that mass selection cannot. When populations were subdivided and subjected to group selection, they responded far more than predicted by additive models alone.
Variance conversion: This is perhaps the most important theoretical insight. When populations go through bottlenecks (genetic drift changes allele frequencies), epistatic variance can be "converted" to additive variance. This happens because drift changes the genetic background against which other alleles are measured. An allele whose effect was previously masked by interactions becomes exposed when its interacting partners change in frequency. This means drift doesn't just add noise — it fundamentally changes the substrate on which selection acts.
The resolution? The debate remains formally unresolved. But what is not debated is that epistasis is real and common. The question is: how much does it matter for adaptive evolution in nature?
Four Papers, Two Perspectives
Measuring Epistasis — Line Cross Analysis
How do we actually detect and measure epistasis? One of the most powerful classical approaches is line cross analysis (LCA).
The Basic Idea
Cross two divergent lines — these can be different populations, species, or artificially selected lines that differ in a trait of interest. Then create a series of composite generations: the F1 (first filial generation), F2 (second filial), backcrosses to each parent (BC1 and BC2), and potentially additional generations. Measure the trait in every generation. The pattern of generation means reveals the genetic architecture underlying the trait difference.
What the Patterns Tell Us
If inheritance is purely additive: The F1 mean falls exactly at the midparent value (halfway between P1 and P2). The F2 mean equals the F1 mean. Backcross means fall midway between the F1 and the respective parent. Everything is clean, predictable, and linear.
If dominance is present: The F1 deviates from the midparent value (toward one parent or the other). But F2 still behaves predictably relative to the F1 and backcrosses.
If epistasis is present: These simple expectations break down. F2 means deviate from F1 means. Backcross means are asymmetric. The specific pattern of deviations reveals which types of epistasis are at play: additive × additive ([aa]), additive × dominance ([ad] and [da]), or dominance × dominance ([dd]).
The Traditional Approach
The classical method is the joint-scaling test (Cavalli 1952, Hayman 1958, Mather & Jinks 1982). This involves fitting models of increasing complexity (additive only, additive + dominance, additive + dominance + epistasis) and testing whether each additional parameter significantly improves the fit. The problem: this relies on sequential hypothesis testing, which has well-known issues with statistical power, model selection bias, and the arbitrary nature of p-value thresholds.
The Joint-Scaling Test
Toggle between additive and epistatic models to see how gene interactions distort generation means from their expected values. In the epistatic model, note how the F2 deviates from the F1, and backcross means are pulled asymmetrically — these are the signatures of epistasis that line cross analysis detects. (Illustrative data — values are theoretical expectations, not from a specific experiment.)
SAGA — A Better Approach to Line Cross Analysis
The Blackmon Lab developed SAGA (Statistical Analysis of Genetic Architecture) to address the fundamental limitations of the traditional joint-scaling test.
The Problem with Sequential Testing
The traditional approach fits a sequence of models — additive, then additive + dominance, then additive + dominance + epistasis — and uses chi-squared tests to determine when to stop adding parameters. This has several well-known problems:
Multiple testing: Each sequential test increases the overall false positive rate. The order in which parameters are tested matters, and different orders can give different conclusions.
All-or-nothing: You either reject or fail to reject each model. There is no way to express uncertainty about which model is best, or to average across models when several are similarly supported.
Lack of parsimony penalty: Chi-squared tests do not inherently penalize model complexity. A model with more parameters will always fit better, even if the additional parameters are capturing noise rather than signal.
The SAGA Solution
SAGA uses an information-theoretic approach based on AICc (corrected Akaike Information Criterion). Instead of asking "Is this model significantly better than the simpler one?" it asks "What is the relative support for each model given the data?"
Key advantages:
Simultaneous model comparison: All candidate genetic architecture models are evaluated at once, not sequentially. This includes all possible combinations of additive, dominance, and epistatic effects.
Natural complexity penalty: AICc includes a penalty for the number of parameters, which increases when sample sizes are small. This automatically guards against overfitting.
Model averaging: Instead of choosing a single "best" model, SAGA provides model-averaged parameter estimates weighted by the relative support for each model. This means your estimates of genetic effects incorporate model uncertainty.
Confidence sets: SAGA produces a confidence set of models (e.g., the set of models within 2 AICc units of the best), giving you a clear picture of which architectures are plausible and which are not.
SAGA2
The current implementation is the SAGA2 R package, freely available on GitHub. It extends the original SAGA framework to handle sexual dimorphism and genotype-by-environment interactions, allowing researchers to partition genetic architecture into even finer components.
The Methods Papers
Install: devtools::install_github("coleoguy/SAGA2")
GitHub: github.com/coleoguy/SAGA2
Vignette: saga.pdf
This chart illustrates how SAGA evaluates all candidate models simultaneously using AICc weights. The traditional approach (sequential testing) would select the first "non-significant" model and stop. SAGA instead provides the relative support for each architecture, revealing that multiple models may be similarly supported. (Illustrative data — values are simulated to demonstrate the method.)
Wright Was Right
The Blackmon Lab put Wright's ideas to the most comprehensive empirical test ever conducted. In Burch et al. 2024, the lab analyzed over 1,600 line cross datasets spanning plants and animals — the largest survey of epistasis in the history of genetics.
Key Findings
Epistasis is pervasive. It was detected in the majority of crosses examined. This is not a marginal effect confined to a few unusual systems — it is a widespread feature of genetic architecture across the tree of life.
The importance varies, but it is rarely absent. Different taxa and trait categories show different levels of epistasis, but purely additive models were rarely the best-supported architecture. Morphological traits, life history traits, physiological traits, and behavioral traits all showed substantial epistatic contributions.
The patterns are consistent with Wright's vision. Complex genetic architectures where gene interactions matter are the norm, not the exception. The additive-variance-centric view of evolution — Fisher's view — captures an important part of the picture, but it is incomplete.
Recognition
This paper was selected for the 2025 Society for the Study of Evolution President's Award for Outstanding Dissertation Paper — one of SSE's most prestigious honors, recognizing the most impactful dissertation-based publications in the field.
Implications
If epistasis is truly pervasive, then several major conclusions follow:
The response to selection is context-dependent. An allele that is beneficial in one population may be neutral or harmful in another, depending on the epistatic context. This complicates predictions about evolutionary trajectories.
Population structure matters. If the fitness landscape is rugged (multi-peaked), then the size and connectivity of populations affects which peaks can be reached. Wright's emphasis on population structure was prescient.
Additive variance is not the whole story. Fisher's fundamental theorem, while mathematically elegant, applies only to the additive component of fitness variation. If epistasis converts to additive variance through drift (as Wright and later Goodnight showed), then the interplay between drift and selection becomes crucial — exactly as Wright argued.
The Largest Survey of Epistasis
Beyond Line Crosses — Modern Approaches
Line cross analysis is powerful but limited to systems where controlled crosses are possible. Modern genomics has opened entirely new windows on epistasis.
GWAS Epistasis Scans
Genome-wide association studies can be extended to test all pairwise (or higher-order) combinations of SNPs for interactions. With n SNPs, there are n²/2 pairwise interactions to test — a massive multiple testing burden that requires enormous sample sizes and sophisticated computational methods. Despite these challenges, GWAS epistasis scans have revealed significant gene-gene interactions for complex traits including height, body mass index, and disease risk in humans.
QTL Mapping
In structured experimental crosses (F2 populations, recombinant inbred lines, MAGIC populations), researchers can map epistatic QTL interactions with greater statistical power than GWAS. The advantage is that linkage disequilibrium is controlled by the experimental design. Studies in model organisms have repeatedly found that epistatic QTL are common and can explain substantial fractions of phenotypic variance that additive QTL miss.
Mutation Accumulation Experiments
Allow mutations to accumulate in replicate lines under minimal selection, then measure fitness in different genetic backgrounds. If the fitness effects of mutations depend on the background they are tested in, epistasis is present. These experiments have provided some of the clearest evidence for widespread epistasis, particularly in microbial systems where large populations and many generations are feasible.
Theoretical Connections
Fisher's geometric model (FGM) predicts specific patterns of epistasis: mutations of large effect should show diminishing returns (negative epistasis), while mutations near an optimum should show increasingly negative interactions. In contrast, Wright's landscape model predicts a mix of positive and negative epistasis depending on the ruggedness of the landscape and where the population sits relative to peaks and valleys. Empirical patterns tend to show more complexity than either model alone predicts.
Machine Learning Approaches
Random forests, gradient boosting, and neural networks can capture non-linear (epistatic) genotype-phenotype relationships without requiring explicit specification of interaction terms. These approaches are increasingly used for phenotype prediction and have shown improved accuracy when epistasis is present. The major challenge is interpretability — a neural network may capture epistatic effects perfectly but tell you nothing about which specific genes are interacting or why.
Essential Reading
Why Epistasis Matters
Speciation
Dobzhansky-Muller incompatibilities are, at their core, epistasis. An allele that functions perfectly well in its home genetic background causes problems — reduced fitness, sterility, or inviability — when placed into a hybrid genetic background. This is the genetic basis of reproductive isolation, and therefore speciation. Without epistasis, speciation through the accumulation of genetic incompatibilities would be impossible.
Disease
Many complex diseases involve gene-gene interactions. The "missing heritability" problem — the observation that identified GWAS variants explain only a fraction of the heritability estimated from family studies — may partly reflect undetected epistatic interactions. If the effect of a risk allele depends on variants at other loci, standard additive GWAS approaches will underestimate its contribution. Accounting for epistasis could close the gap between observed and predicted heritability for diseases like diabetes, heart disease, and psychiatric disorders.
Agriculture
Heterosis (hybrid vigor) — the phenomenon where F1 hybrids outperform both parents — involves non-additive genetic effects. Whether heterosis is primarily due to dominance (masking of deleterious recessives) or epistasis (favorable interactions between alleles from different parents) remains debated, but epistatic contributions are increasingly recognized. Understanding these interactions is crucial for modern breeding programs that seek to predict hybrid performance from parental genotypes.
Drug Resistance
Combinations of resistance mutations can interact in unexpected ways. Some combinations are more than the sum of their parts (positive epistasis, accelerating resistance evolution), while others are less than the sum (negative epistasis, potentially constraining it). Understanding the epistatic landscape of drug resistance is critical for designing combination therapies and predicting the evolutionary trajectory of pathogens and cancers.
The Fundamental Question
Is evolution primarily a smooth, additive, hill-climbing process (Fisher)? Or is it navigation of a rugged landscape requiring drift, population structure, and gene interaction to find global optima (Wright)? A century of accumulating data increasingly suggests: both views capture important truths, but Wright's emphasis on epistasis was more prescient than the field long acknowledged. The Blackmon Lab's comprehensive analysis of line cross data provides perhaps the strongest evidence yet that epistasis is a fundamental and pervasive feature of genetic architecture — not a statistical curiosity to be swept into the error term.
Epistasis Across Biology
Explore our research page to learn more about how the Blackmon Lab studies genetic architecture, epistasis, and the forces shaping genome evolution across the tree of life.