Home › Tools › PCM guide › Discrete traits

Discrete trait evolution.

Many traits do not vary continuously. A species either has wings or does not, has an XY sex determination system or an X0 system, is herbivorous or carnivorous. Discrete-trait evolution uses Markov-chain models on phylogenies to describe how lineages move between states, reconstruct ancestors, and test whether traits are correlated in their evolution.

Discrete traits in biology

Not all traits vary continuously. Many important characters come in discrete states: a species either has wings or does not, has an XY sex determination system or an X0 system, is herbivorous or carnivorous, or lives on land or in water.

Analyzing these traits requires different models. You cannot use Brownian motion, which assumes continuous change along a random walk. Instead, you need a Markov chain model that describes transitions between discrete states.

The Mk model

The Mk model (Lewis 2001, Syst. Biol. 50:913–925; likelihood computation in Pagel 1994 and Nielsen 2002) is the standard framework for discrete trait evolution. Imagine a trait with two states: State A and State B. At any point in time, a lineage can change from A to B, or from B to A, according to certain instantaneous rates, entries in a Q matrix, not directly observable counts. A warning that matters later: if branch lengths are not absolute, the Mk model is only identifiable up to a scaling of all rates by a constant; this creates the well-known likelihood ridge problem when transition rates wander to unrealistically high values during ML optimization.

These rates are encoded in a matrix called Q. For a two-state system, Q has four entries: q01 (rate of change from A to B) and q10 (rate from B to A). Under a Markov process, the probability of being in each state evolves according to these rates.

State transition rates

q01 (A→B) 1.0

q10 (B→A) 1.0

Stationary frequencies (at equilibrium)

State A

50%

State B

50%

How to interpret: Try adjusting the sliders. If q01 is much higher than q10, state A tends to change to B more readily. At equilibrium, the stationary frequency of B will be higher. These rates define the expected long-run proportions of each state.

Ancestral state reconstruction

A key question in evolutionary biology: what state did the common ancestor have? If you observe state A in one species and state B in another sister species, did their ancestor have state A, state B, or are both equally plausible?

Ancestral state reconstruction (ASR) uses maximum likelihood to estimate the probability of each state at each internal node of the tree. The reconstruction accounts for the tip states and the model of evolution.

What you see: The 12-tip tree starts with only the tip states shown (colored tips). When you click "Reconstruct," the ancestral states fill in from tips to root. The pie charts at internal nodes show the probability of each state.

Stochastic character mapping

ASR gives you point estimates (maximum likelihood) of ancestral states. But there is uncertainty. The actual history of character evolution could have taken many different paths, all consistent with the tip data. Crucially, the probabilities at internal nodes are conditional on the fitted model: if you assumed a symmetric Mk2 but the true rate matrix is biased toward losses, your ASR will be confidently wrong. Sensitivity to model choice is one of the most underappreciated issues in PCMs. Reconstructing under several plausible models and reporting the consensus, or formally averaging across models, is much safer than reporting a single ASR as a point estimate.

Stochastic character mapping samples many possible histories from the conditional distribution given the tip data, the tree, and the fitted model. The samples represent uncertainty about where and when changes happened on the tree, not uncertainty about the model itself. You can then describe what happened: how many transitions occurred, on which branches, and in which direction.

Panel A: Observed Tip States

The data: tip species have observed states (green=A, gold=B).

Panel B: Possible History 1

One possible history: one transition A→B at the base of the right clade.

Panel C: Possible History 2

Another history: ancestor was B, transition to A happened on the left branch.

Both of these histories are consistent with the observed data. Stochastic character mapping samples many such histories, weighted by their likelihood under the model. This lets you estimate the expected number of transitions and their locations on the tree.

Testing correlations between discrete traits

Suppose you want to know: does having wings correlate with flying behavior? Or does having an XY sex determination system correlate with having sex-limited traits? Pagel's test is a phylogenetic comparative method that tests whether two discrete traits evolve independently or are correlated.

The test compares two models: one where the traits evolve independently (simpler), and one where they can influence each other's evolution (more complex). A significant result suggests the traits are correlated in their evolution.

What you actually do in R

The ape package has the ace function for maximum likelihood ancestral state reconstruction. The phytools package has more advanced functions including stochastic character mapping:

      # Ancestral state reconstruction with ace()

      library("ape")

      library("phytools")

      # tree: your phylogeny

      # trait: vector of tip states (names must match tree$tip.label)

      asr <- ace(trait, tree,

          type = "discrete",

          method = "ML")

      # Stochastic character mapping: sample 100 evolutionary histories

      sims <- make.simmap(tree, trait, nsim = 100)

      # Count transitions

      trans <- describe.simmap(sims)

Chromosome number as a discrete trait: chromePlus

Chromosome number is one of the most dynamic features of eukaryotic genomes. Across species, chromosomes fuse, split, and duplicate, sometimes driving reproductive isolation and speciation. Modeling how chromosome number evolves requires treating it as a special kind of discrete trait where the state space is large (chromosome numbers from 1 to 100+) and the transitions have biological structure: gains of one, losses of one, and whole-genome duplications (polyploidy) all have distinct rates. For a history of how scientists developed these ideas, from Stebbins and White to Mayrose and Zenil-Ferguson, see the Intellectual History of Chromosome Number Evolution. Empirical karyotype data for thousands of species is available in the Karyotype Databases.

chromePlus is an R package developed in this lab that extends the chromEvol framework within the diversitree ecosystem. It fits Markov chain models of chromosome evolution on phylogenies using MCMC, and critically, allows a binary trait (such as sex determination system, mating strategy, or presence of B chromosomes) to influence the rates of chromosome change. This lets you ask: does having sex chromosomes speed up or slow down karyotype evolution?

The model tracks a state space combining chromosome number and binary trait. From any state, five types of events can occur:

+1 Dysploidy gain, one chromosome added

−1 Dysploidy loss, one chromosome removed

×2 Polyploidy, whole-genome duplication

½ Demiploidy, half-genome reduction event

A↔B Binary trait transition, changes the rate regime

Each of these rates can be conditioned on the binary trait state, so you can model a world where sex-chromosome-bearing lineages (state B) have faster dysploidy rates than those without (state A).

A polyploidy event in the ancestor of the gold clade doubled the chromosome number. dysploidy then reduced n=18 → n=17 in one lineage.

In practice, chromePlus builds a constrained likelihood on top of diversitree's Markov-model machinery: you hand it a species-by-state probability matrix, it returns a likelihood function with the chromPlus rate structure (asc, desc, pol, dem in each binary state, plus A↔B transitions) wired in. That likelihood plugs straight into diversitree::mcmc(), and the posterior samples it returns can then be visualised with plotChromeplus(). Model comparison (e.g., rates equal vs rates free between binary-trait states) is done by refitting under a tighter constrain list and comparing the resulting posteriors.

      # Install chromePlus and its diversitree backend

      install.packages("remotes")

      remotes::install_github("coleoguy/chromePlus")

      library("chromePlus")

      library("diversitree")

      # dat: data.frame(species, haploid_chrom, prob_state_A)

      # tree: phylo object with tip.label matching dat$species

      # 1. Build a probability matrix over chromosome states x binary states

      pmat <- datatoMatrix(x = dat, range = c(5, 25),

                               hyper = TRUE,

                               state.names = c("A", "B"))

      # 2. Build a diversitree mkn likelihood from the matrix

      lik <- make.mkn(tree, pmat, strict = FALSE,

                        control = list(method = "ode"))

      # 3. Constrain it to the chromPlus rate scheme

      # Parameter names come out as asc.A, desc.A, pol.A, dem.A,

      # asc.B, desc.B, pol.B, dem.B, tran.A.to.B, tran.B.to.A

      con.lik <- constrainMkn(data = pmat, lik = lik,

                                   state.names = c("A", "B"))

      # 4. Run MCMC through diversitree

      inits <- startVals(n = 10, min = 0, max = 0.2)

      post <- mcmc(con.lik, x.init = inits, nsteps = 2000, w = 0.1)

      # 5. Plot posterior densities of the gain rate in each state

      plotChromeplus(data = post[, c("asc.A", "asc.B")],

                             colors = c("#fb923c", "#500000"),

                             x_title = "chromosome gain rate",

                             main_title = "Gain rate in state A vs B")

Maximum likelihood vs Bayesian inference

All of the methods on this page require fitting a model to data, estimating rate parameters like q01 and q10. There are two major philosophies for doing this, and both appear in phylogenetic software. Understanding the difference matters because they answer subtly different questions and have different computational demands.

Maximum Likelihood (ML)

The likelihood function L(θ | data) describes how probable the observed data are for each possible value of the parameter θ. Maximum likelihood finds the single value of θ that makes the data most probable, the peak of this surface.

This is computationally fast because you only need to find one point. Tools like ape::ace(method="ML") and corHMM use numerical optimization (gradient descent or Nelder-Mead) to locate the peak quickly, even for large trees.

The cost: you get a point estimate, not a distribution. Standard errors can be approximated from the curvature of the likelihood surface, but this assumes the surface is roughly parabolic near the peak, an assumption that can break down with sparse data.

Bayesian Inference

Bayes' theorem says: posterior ∝ likelihood × prior. Rather than finding the single best θ, Bayesian inference characterizes the full posterior distribution over θ. That is the probability of each parameter value given the data and your prior beliefs.

Uncertainty propagates naturally. You get a credible interval that directly means "there is 95% posterior probability the parameter lies here." Stochastic character mapping is inherently Bayesian for exactly this reason.

The cost: the posterior is rarely tractable analytically. In practice, MCMC (Markov chain Monte Carlo) is used to sample from it, which requires running a chain for tens of thousands of iterations. For complex models on large trees, this can take hours or days.

Adjust the sliders to see how the likelihood surface and prior shape the posterior distribution. The ML estimate is the peak of the likelihood curve; the Bayesian posterior is the product of likelihood × prior (renormalized).

Data strength (sample size proxy)

4.0

Prior strength (stronger = more informative)

1.0

Likelihood L(θ|data) Prior p(θ) Posterior p(θ|data) ML estimate (peak)

In practice: use ML when you need fast answers or are running many model comparisons (e.g., testing 10 different rate matrices). Use Bayesian methods (MCMC) when you need full uncertainty quantification, want to formally incorporate prior information, or are working with models too complex for numerical optimization (like chromePlus, or joint inference of rates and ancestral states). There is also a practical numerical reason to prefer MCMC: discrete trait models can have flat likelihood ridges at unrealistically high rate values, where many parameter combinations fit the data nearly equally well. MCMC with a sensible prior keeps optimizers from wandering into that region.