Discrete Trait Evolution
Discrete traits in biology
Not all traits vary continuously. Many important characters come in discrete states: a species either has wings or does not, has an XY sex determination system or an X0 system, is herbivorous or carnivorous, or lives on land or in water.
Analyzing these traits requires different models. You cannot use Brownian motion, which assumes continuous change along a random walk. Instead, you need a Markov chain model that describes transitions between discrete states.
The Mk model
The Mk model (for "Markov model of k states") is the standard framework for discrete trait evolution. Imagine a trait with two states: State A and State B. At any point in time, a lineage can change from A to B, or from B to A, according to certain rates.
These rates are encoded in a matrix called Q. For a two-state system, Q has four entries: q01 (rate of change from A to B) and q10 (rate from B to A). Under a Markov process, the probability of being in each state evolves according to these rates.
Ancestral state reconstruction
A key question in evolutionary biology: what state did the common ancestor have? If you observe state A in one species and state B in another sister species, did their ancestor have state A, state B, or are both equally plausible?
Ancestral state reconstruction (ASR) uses maximum likelihood to estimate the probability of each state at each internal node of the tree. The reconstruction accounts for the tip states and the model of evolution.
Stochastic character mapping
ASR gives you point estimates (maximum likelihood) of ancestral states. But there is uncertainty. The actual history of character evolution could have taken many different paths, all consistent with the tip data.
Stochastic character mapping samples many possible histories from the posterior distribution, conditional on the data and the model. You can then describe what happened: how many transitions occurred, on which branches, and in which direction.
Both of these histories are consistent with the observed data. Stochastic character mapping samples many such histories, weighted by their likelihood under the model. This lets you estimate the expected number of transitions and their locations on the tree.
Testing correlations between discrete traits
Suppose you want to know: does having wings correlate with flying behavior? Or does having an XY sex determination system correlate with having sex-limited traits? Pagel's test is a phylogenetic comparative method that tests whether two discrete traits evolve independently or are correlated.
The test compares two models: one where the traits evolve independently (simpler), and one where they can influence each other's evolution (more complex). A significant result suggests the traits are correlated in their evolution.
What you actually do in R
The ape package has the ace function for maximum likelihood ancestral state reconstruction. The phytools package has more advanced functions including stochastic character mapping:
library("ape")
library("phytools")
# tree: your phylogeny
# trait: vector of tip states (names must match tree$tip.label)
asr <- ace(trait, tree,
type = "discrete",
method = "ML")
# Stochastic character mapping: sample 100 evolutionary histories
sims <- make.simmap(tree, trait, nsim = 100)
# Count transitions
trans <- describe.simmap(sims)
Chromosome number as a discrete trait: chromePlus
Chromosome number is one of the most dynamic features of eukaryotic genomes. Across species, chromosomes fuse, split, and duplicate — sometimes driving reproductive isolation and speciation. Modeling how chromosome number evolves requires treating it as a special kind of discrete trait where the state space is large (chromosome numbers from 1 to 100+) and the transitions have biological structure: gains of one, losses of one, and whole-genome duplications (polyploidy) all have distinct rates. For a history of how scientists developed these ideas — from Stebbins and White to Mayrose and Zenil-Ferguson — see the Intellectual History of Chromosome Number Evolution. Empirical karyotype data for thousands of species is available in the Karyotype Databases.
chromePlus is an R package developed in this lab that extends the chromEvol framework within the diversitree ecosystem. It fits Markov chain models of chromosome evolution on phylogenies using MCMC, and critically, allows a binary trait (such as sex determination system, mating strategy, or presence of B chromosomes) to influence the rates of chromosome change. This lets you ask: does having sex chromosomes speed up or slow down karyotype evolution?
The model tracks a state space combining chromosome number and binary trait. From any state, five types of events can occur:
Each of these rates can be conditioned on the binary trait state — so you can model a world where sex-chromosome-bearing lineages (state B) have faster dysploidy rates than those without (state A).
A polyploidy event in the ancestor of the gold clade doubled the chromosome number. dysploidy then reduced n=18 → n=17 in one lineage.
In R, chromePlus wraps MCMC inference around these models, returning posterior distributions over rate parameters. You can compare models (e.g., rates equal vs rates free between binary trait states) using Bayes factors from the MCMC output.
devtools::install_github("coleoguy/chromePlus")
library("chromePlus")
# Simulate chromosome evolution on a tree
# tree: phylo object; start: ancestral chr number; rates: c(gain, loss, poly, demi)
sim <- simChrom(tree, pars = c(0.1, 0.1, 0.01, 0.001),
root.value = 8, chrRange = c(1, 50))
# Fit a model allowing a binary trait to modulate rates
fit <- makeSSEchrom(tree, chr.data, trait.data,
model = "chromPlus")
# Plot MCMC results
plotChromeplus(fit)
Maximum likelihood vs Bayesian inference
All of the methods on this page require fitting a model to data — estimating rate parameters like q01 and q10. There are two major philosophies for doing this, and both appear in phylogenetic software. Understanding the difference matters because they answer subtly different questions and have very different computational demands.
Maximum Likelihood (ML)
The likelihood function L(θ | data) describes how probable the observed data are for each possible value of the parameter θ. Maximum likelihood finds the single value of θ that makes the data most probable — the peak of this surface.
This is computationally fast because you only need to find one point. Tools like ape::ace(method="ML") and corHMM use numerical optimization (gradient descent or Nelder–Mead) to locate the peak quickly, even for large trees.
The cost: you get a point estimate, not a distribution. Standard errors can be approximated from the curvature of the likelihood surface, but this assumes the surface is roughly parabolic near the peak — an assumption that can break down with sparse data.
Bayesian Inference
Bayes' theorem says: posterior ∝ likelihood × prior. Rather than finding the single best θ, Bayesian inference characterizes the full posterior distribution over θ — the probability of each parameter value given the data and your prior beliefs.
This is elegant because it propagates uncertainty naturally. You don't just get a best estimate — you get a credible interval that directly means "there is 95% posterior probability the parameter lies here." Stochastic character mapping is inherently Bayesian for exactly this reason.
The cost: the posterior is rarely tractable analytically. In practice, MCMC (Markov chain Monte Carlo) is used to sample from it, which requires running a chain for tens of thousands of iterations. For complex models on large trees, this can take hours or days.
Adjust the sliders to see how the likelihood surface and prior shape the posterior distribution. The ML estimate is the peak of the likelihood curve; the Bayesian posterior is the product of likelihood × prior (renormalized).
In practice: use ML when you need fast answers or are running many model comparisons (e.g., testing 10 different rate matrices). Use Bayesian methods (MCMC) when you need full uncertainty quantification, want to formally incorporate prior information, or are working with models too complex for numerical optimization (like chromePlus, or joint inference of rates and ancestral states). There is also a practical numerical reason to prefer MCMC: discrete trait models can have flat likelihood ridges at unrealistically high rate values, where many parameter combinations fit the data nearly equally well — MCMC with priors prevents optimizers from wandering into this region.