Ancestral State Reconstruction

One-sentence definition. Ancestral state reconstruction (ASR) is a phylogenetic method for inferring the most probable trait values of ancestral species given a tree topology, branch lengths, a model of trait evolution, and the observed trait values at the tips.

One-sentence analogy. ASR is like working backwards from the stories your grandchildren tell about a great-grandparent you never met — you triangulate what she was probably like from the pattern of traits distributed across her surviving descendants.

Why it matters. ASR is how the lab establishes evolutionary polarity — whether a given state is ancestral or derived. For sex chromosome evolution, likelihood-based ASR across a >13,000-species insect database places 100% posterior probability on male heterogamety at the insect root, confirming it as the ancestral state. Model choice matters critically: allowing irreversible vs. reversible transitions can change inferred origin counts by ~40% (e.g., 7.9 vs. 12.9 origins of haplodiploidy in mites depending on model).

Where you meet it in the wiki.

Primary citation.

“We find strong evidence for the node leading to insects being male heterogametic (100% probability), but we have little power to distinguish between XY and XO sex chromosome systems (60% and 40% probability, respectively).” — Blackmon et al. 2017, Finding 1

Prerequisites: Mk model Next, learn about: SIMMAP, BiSSE

Background

Ancestral state reconstruction has a long history in comparative biology. Fitch (1971) and Farris (1970) each proposed parsimony algorithms that minimize the number of character-state changes required to explain observed tip values. Parsimony is fast and intuitive, but it is not model-based and gives no uncertainty estimates at ancestral nodes.

Pagel (1994, 1999) showed that likelihood-based ASR scores a character history against an explicit model of trait evolution and uses maximum likelihood to find the ancestral states that best fit the data. SIMMAP (Bollback 2006) extends this into a Bayesian framework, sampling from the posterior of complete character histories rather than returning a single best reconstruction.

We use ASR to ask whether a given state is ancestral or derived in a clade, and how many times it arose independently. Those questions sit at the center of karyotype evolution research.

How it works

Likelihood ASR uses a pruning algorithm in two passes. The down-pass (tips to root) computes the conditional likelihood of all tip data given each possible ancestral state at each node. The up-pass (root to tips) combines information from both directions to produce a marginal probability distribution over states at each node. These marginal reconstructions appear as pie charts on published phylogenies.

Marginal reconstruction estimates the probability of each state at one node at a time. Joint reconstruction finds the single combination of states across all nodes that maximizes the likelihood of the whole tree. For questions about a single deep node, marginal reconstruction is appropriate.

Stochastic character mapping (SIMMAP) samples complete histories of state changes along every branch, weighted by their posterior probability. Summary statistics from many sampled histories (e.g., expected number of transitions, time in each state per branch) carry uncertainty that a single marginal reconstruction cannot provide.

The rate model matters. An equal-rates (ER) model forces all transition rates to be equal; an all-rates-different (ARD) model allows each pair of states its own rate; an irreversible model permits transitions in only one direction. Each model implies a different prior on ancestral states. We find that model choice changes inferred ancestral state counts by up to 40% in karyotype analyses, so model selection is not a formality.

A worked example

Consider ASR of haploid chromosome number across beetles (order Coleoptera). We collect chromosome counts from museum databases, prune them to a time-calibrated phylogeny, and specify an ordered Mk model that allows chromosome number to change by one step at a time. We run marginal reconstruction at each internal node.

At a deep node, the output might show: n=9 with posterior probability 0.58, n=10 with posterior probability 0.31, and remaining probability spread across adjacent states. That spread is the honest answer. If we plot a pie chart on the tree, each slice represents that uncertainty directly.

A SIMMAP analysis of the same data produces 1,000 sampled histories. Summarizing across histories, we might find a mean of 14.2 (95% CI: 11-18) transitions from n=9 to n=10 across the Coleoptera phylogeny. That branch-level summary is only available from stochastic mapping, not from marginal node reconstruction alone.

Common misconceptions

How to spot it in papers

Further reading

Within this wiki:

Question copied. Paste it into the NotebookLM tab.