Home › Tools › PCM guideContinuous traits

Continuous trait evolution.

Body size, metabolic rate, chromosome number, genome size. Continuous traits vary along a spectrum and evolve along branches of a phylogeny. This page builds up from Brownian motion to phylogenetic signal and PGLS regression, with live simulations along the way.

What is a continuous trait?

Continuous traits are characters that vary along a spectrum. Body size, brain mass, metabolic rate, chromosome number, genome size. These are not discrete categories. A species can have a body size of 2.3 kg, or 4.7 kg, or any value in between.

When you measure a continuous trait across species, you get a dataset of real numbers. The question in phylogenetic comparative methods is: how much of the variation in this trait is explained by shared ancestry, and how much is explained by independent evolution in different lineages?

The problem with naive regression

Suppose you measure body size and home range for 50 species of carnivores. You want to know: do larger animals have larger home ranges? A simple linear regression seems like the obvious approach.

The problem: your 50 species are not independent data points. If 20 of them are closely related lions, they inherited similar body sizes and home ranges from their common ancestor. Your regression treats each as an independent observation, inflating the sample size and giving you false confidence in your result.

This is the essence of the phylogenetic comparative problem. We need methods that acknowledge the tree structure.

Brownian Motion

The foundation for most continuous trait analysis is the Brownian motion model. This is a random walk: at each instant in time, a trait changes by a small, random amount. The change is drawn from a normal distribution with mean 0 and variance sigma-squared (the rate of evolution).

Under Brownian motion, the variance of a trait at the tips of the tree is proportional to time. A longer branch accumulates more variance. A shorter branch accumulates less. This is exactly what we expect under random evolution.

What BM assumes (and when it breaks): the model presumes (1) no directional bias (the trait is as likely to increase as decrease at any moment); (2) no selection toward a fixed optimum (if traits are pulled toward an attractor, deviations damp out and BM overestimates the long-run variance); and (3) constant rate across lineages and time. When traits actually evolve under stabilizing selection, the relevant model is the Ornstein-Uhlenbeck (OU) process, which adds a deterministic pull toward an optimum θ with strength α; BM is the limit as α → 0. When rates vary across the tree, options include relaxed-rate models (BMS), early-burst (EB) models for adaptive radiation, and rate-shift detection methods like BAMM. BM is a reasonable null but rarely the truth, so a working comparative-methods study compares it against alternatives rather than assuming it.

1.0
How it works: Start at the root with an ancestral trait value. As you move along branches, the trait performs a random walk. Longer branches (more time) lead to larger changes. The simulation shows one random realization of this process.
Tip Trait Values

Phylogenetic signal

Not all traits evolve under Brownian motion at the same rate. Some traits show strong phylogenetic signal: closely related species are very similar, and the trait is predictable from phylogeny. Other traits show weak signal: the trait is distributed randomly across the tree, with no obvious relationship to phylogeny.

The most common way to quantify signal is Blomberg's K. K is defined relative to Brownian motion: K = 1 when the phylogenetic structure of trait variation matches the BM expectation given the tree. K > 1 means the trait is more strongly structured by phylogeny than BM predicts (e.g., conserved within clades); K < 1 means it is less structured (e.g., convergent evolution or recent rate changes).

Other signal measures. Pagel's λ (covered on the discrete-continuous page) is the maximum-likelihood scaling that makes the BM-implied covariance match the data, arguably more interpretable than K because it is directly a parameter of the model. For binary traits, Fritz & Purvis's D-statistic generalizes the signal idea to discrete characters. Modern best practice is to report at least two complementary measures (e.g., K and λ) since they have different sensitivities to tip-trait outliers and tree shape.

High Phylogenetic Signal (K > 1)
0.9 1.1 5.0 5.1
Sister species (green and orange clades) have very different trait values. But within each clade, species are similar. The trait clusters by phylogeny.
Low Phylogenetic Signal (K < 1)
0.8 0.9 0.7 0.6
Colors are scattered across the tree. Green and orange tips appear in both clades. The trait is random with respect to phylogeny. This can indicate rapid evolution or convergence.

Phylogenetic Generalized Least Squares (PGLS)

When you regress one continuous trait against another (e.g., body size vs. home range), you want to account for the phylogenetic non-independence. This is what PGLS does.

PGLS is a regression method that uses the expected covariance matrix from the phylogeny to weight the data. Under Brownian motion, species that are closer in the tree are expected to be more similar. PGLS exploits this to give you regression coefficients that are maximum-likelihood estimates under a BM model, correcting for non-independence due to shared ancestry. Caveat: if the trait is not actually evolving by BM (for example, under strong stabilizing selection), PGLS estimates can be biased; in that case OU-based regressions or model-comparison approaches are more appropriate.

Why OLS is wrong: The red line (OLS) is influenced heavily by clusters of closely related species. If you have 10 lions that all happen to be large and have large home ranges, OLS gives them equal weight to an isolated, distantly related species. PGLS corrects for this by treating them as less independent.

What you actually do in R

In practice, you use the gls function from the nlme package with a correlation structure from ape:

# Load packages
library("nlme")
library("ape")

# tree: your phylogeny (phylo object)
# data: your data frame with trait columns

model <- gls(trait2 ~ trait1,
data = data,
correlation = corBrownian(phy = tree))

summary(model)

The corBrownian function encodes the phylogenetic covariance matrix. The gls function (generalized least squares) uses this matrix to compute regression coefficients that properly account for phylogenetic non-independence.

Question copied. Paste it into the NotebookLM tab.