Phylogenies: trees, branch lengths, and why evolution is not a ladder.
A phylogeny is a branching hypothesis about how species are related. This page walks through how to read one, why layout is irrelevant, what branch lengths actually encode, and why shared ancestry is the reason you cannot run ordinary statistics across species.
What is a phylogeny?
A phylogeny is a branching diagram showing the evolutionary relationships among species. Each tip represents a living (or extinct) species. Each internal node marks the point at which an ancestral population split into two descendant lineages, a hypothesized speciation event, not a single ancestor individual. The branches connecting them represent time or expected evolutionary change, depending on how the tree was estimated.
Think of the phylogeny as a family tree of life. Just as your grandparents had children who had children, species have common ancestors. The phylogeny is a hypothesis about how all these ancestors and descendants are related.
Below is an interactive phylogeny of 15 vertebrate species. Hover over the tips to see the species highlighted, along with its entire ancestral lineage.
Reading a phylogeny
A key insight: the horizontal position of species in a phylogeny does not matter. You can rotate branches around any internal node and get a topology that is logically identical. Below are three different "rotations" of the same six-species tree. They look different, but they show the same relationships.
What matters in a phylogeny is which species are nested within which clades. The visual layout is irrelevant. A and B can be on the left or right, at the top or bottom. What matters is that they share a most recent common ancestor that is not shared with C, D, E, or F.
Branch lengths
In some phylogenies, the length of branches encodes information about evolutionary time or the amount of change. A long branch means more time has passed, or more genetic change has accumulated. A short branch means the lineages diverged recently.
This distinction is crucial. A cladogram shows topology but ignores branch lengths. A chronogram or phylogram uses branch length to encode meaningful information.
In the chronogram above, the horizontal axis represents time. The branch leading to Fish is the longest because it diverged from the other vertebrates around 500 million years ago. Humans and Chimpanzees share a much more recent common ancestor, so the branches separating them are shorter.
Why phylogenies matter for statistics
Here is a critical problem that motivates phylogenetic comparative methods: if you study 1000 species of beetles, you do not have 1000 independent observations. If 500 of those species are sisters (they share a most recent common ancestor that lived 2 million years ago), then those 500 species are not independent. They inherited many traits from their common ancestor.
This is the problem of phylogenetic non-independence. When you analyze traits across species, you must account for the fact that species are related by descent. Otherwise, you violate the assumption of independence that underlies standard statistical tests.
Imagine you want to test whether body size predicts home range across 100 carnivore species. If you use linear regression, you assume the 100 data points are independent. But suppose 50 species are recently evolved lion relatives. They inherited similar body sizes and home ranges from their common ancestor. Now your sample size is effectively much smaller than 100.
The phylogeny gives you the structure to correct for this non-independence. By accounting for the shared ancestry among species, phylogenetic comparative methods let you extract the evolutionary signal from the phylogenetic noise.
How phylogenies are built
Phylogenies are not observed directly. They are inferred from genetic data (DNA sequences), morphological characters (body shape, skeletal features), or a combination of both. The most common method is maximum likelihood (ML), which finds the tree that makes your data most probable under a statistical model of evolution.
Another popular approach is Bayesian inference, which uses Bayes' theorem to calculate a probability distribution over all possible trees. Both methods account for uncertainty in the tree topology.
For the purposes of comparative methods, the key insight is this: the phylogeny you use is an estimate, not a fact. It comes with uncertainty. Some internal branches may be poorly supported. Good practice includes checking the support values (bootstrap percentages in ML, posterior probabilities in Bayesian methods) and sometimes using methods that account for topological uncertainty.
Two cautions on tree interpretation that catch many practitioners. (1) Branch lengths can mean different things. A chronogram has branch lengths in units of time (calendar years, MYA); a phylogram from a substitution-rate analysis has branch lengths in units of expected number of substitutions per site. They are not interchangeable. Using a phylogram where comparative methods expect a chronogram (or vice versa) silently biases every downstream result. (2) Bootstrap and posterior probability are not on the same scale. A clade with bootstrap = 80% does not have posterior probability ≈ 0.80; the two measure different things and behave differently with informative-but-conflicting data. Don't mix them. Comparative methods today often integrate over a posterior sample of trees rather than fixing a single point estimate. That propagates topological and branch-length uncertainty into the final inference (Felsenstein 1985, Am. Nat. 125:1–15, the foundational paper).