Ancestral state reconstruction

What it does. Ancestral state reconstruction (ASR) estimates the most likely character state at internal nodes of a phylogeny given tip states and a model of evolution. Three main approaches are in common use:

  1. Parsimony — assigns node states that minimize the total number of changes. Fast and model-free, but ignores branch lengths and gives no uncertainty estimate.
  2. Maximum likelihood (ML) — optimizes a Markov model (Mk) for the tip data, then returns marginal posterior probabilities for each node under the fitted model. Implemented in ape::ace and phangorn::ancestral.pars/ancestral.mlphylo.
  3. Stochastic character mapping (Bayesian) — draws full histories of the character across the tree from the posterior distribution. Implemented in phytools::make.simmap. See the simmap page for full details.

When to use it.

When NOT to use it.

Worked example

library(ape)
library(phangorn)
library(phytools)

# tree: ape phylo object (ultrametric)
# x: named character vector of discrete states, names = tip labels

# ---- Method 1: Parsimony (phangorn) ----
# Convert tip data to phyDat format
x_phydat <- phyDat(matrix(x, ncol = 1,
                           dimnames = list(names(x), "trait")),
                   type = "USER",
                   levels = unique(x))

pars_anc <- ancestral.pars(tree, x_phydat, type = "MPR")
# Returns a matrix: rows = nodes, columns = states
# Values are 0 or 1 (parsimony-consistent states at each node)
plotAnc(tree, pars_anc, i = 1)   # plot first character


# ---- Method 2: ML via ape::ace ----
fit <- ace(x, tree, type = "discrete", model = "ARD")
fit$loglik
fit$rates        # q01, q10 ML estimates
fit$lik.anc      # marginal probabilities: matrix(nodes x states)

# Plot ancestral states on the tree
plot(tree, cex = 0.6, no.margin = TRUE)
nodelabels(pie = fit$lik.anc,
           piecol = c("orange", "darkred"),
           cex = 0.5)
tiplabels(pie = to.matrix(x, c("XO", "XY")),
          piecol = c("orange", "darkred"),
          cex = 0.3)


# ---- Method 3: Stochastic (phytools::make.simmap) ----
# For full details see the simmap method page.
maps <- make.simmap(tree, x, model = "ARD", nsim = 500,
                    message = FALSE)
sm  <- describe.simmap(maps, plot = FALSE)

# Marginal node probabilities averaged across maps
head(sm$ace)

# Expected number of XO -> XY transitions
mean(sapply(maps, function(m) countSimmap(m)$Tr["XO,XY"]))


# ---- Comparing methods ----
# Parsimony often gives confident (0/1) node assignments that can be
# misleading when transitions are sparse. The ML and stochastic methods
# propagate uncertainty — nodes with ~ 0.5 probability in either state
# honestly reflect ambiguity.
cat("Parsimony node 1:", which.max(pars_anc[1, ]), "\n")
cat("ML node 1 probs:", round(fit$lik.anc[1, ], 3), "\n")

Gotchas we’ve hit

Key papers that use this method in the lab

Question copied. Paste it into the NotebookLM tab.