What is a continuous trait?

Continuous traits are characters that vary along a spectrum. Body size, brain mass, metabolic rate, chromosome number, genome size. These are not discrete categories. A species can have a body size of 2.3 kg, or 4.7 kg, or any value in between.

When you measure a continuous trait across species, you get a dataset of real numbers. The question in phylogenetic comparative methods is: how much of the variation in this trait is explained by shared ancestry, and how much is explained by independent evolution in different lineages?

The problem with naive regression

Suppose you measure body size and home range for 50 species of carnivores. You want to know: do larger animals have larger home ranges? A simple linear regression seems like the obvious approach.

The problem: your 50 species are not independent data points. If 20 of them are closely related lions, they inherited similar body sizes and home ranges from their common ancestor. Your regression treats each as an independent observation, inflating the sample size and giving you false confidence in your result.

This is the essence of the phylogenetic comparative problem. We need methods that acknowledge the tree structure.

Brownian Motion

The foundation for most continuous trait analysis is the Brownian motion model. This is a random walk: at each instant in time, a trait changes by a small, random amount. The change is drawn from a normal distribution with mean 0 and variance sigma-squared (the rate of evolution).

Under Brownian motion, the variance of a trait at the tips of the tree is proportional to time. A longer branch accumulates more variance. A shorter branch accumulates less. This is exactly what we expect under random evolution.

1.0
How it works: Start at the root with an ancestral trait value. As you move along branches, the trait performs a random walk. Longer branches (more time) lead to larger changes. The simulation shows one random realization of this process.
Tip Trait Values

Phylogenetic signal

Not all traits evolve under Brownian motion at the same rate. Some traits show strong phylogenetic signal: closely related species are very similar, and the trait is predictable from phylogeny. Other traits show weak signal: the trait is distributed randomly across the tree, with no obvious relationship to phylogeny.

The most common way to quantify signal is Blomberg's K. Under Brownian motion, K = 1. If K > 1, the trait is more clustered by phylogeny than expected under BM (strong signal). If K < 1, the trait is less clustered (weak signal, more convergence or recent change).

High Phylogenetic Signal (K > 1)
0.9 1.1 5.0 5.1
Sister species (green and orange clades) have very different trait values. But within each clade, species are similar. The trait clusters by phylogeny.
Low Phylogenetic Signal (K < 1)
0.8 0.9 0.7 0.6
Colors are scattered across the tree. Green and orange tips appear in both clades. The trait is random with respect to phylogeny. This can indicate rapid evolution or convergence.

Phylogenetic Generalized Least Squares (PGLS)

When you regress one continuous trait against another (e.g., body size vs. home range), you want to account for the phylogenetic non-independence. This is what PGLS does.

PGLS is a regression method that uses the expected covariance matrix from the phylogeny to weight the data. Under Brownian motion, species that are closer in the tree are expected to be more similar. PGLS exploits this to give you a regression line that reflects the true evolutionary relationships, not just the raw tip data.

Why OLS is wrong: The red line (OLS) is influenced heavily by clusters of closely related species. If you have 10 lions that all happen to be large and have large home ranges, OLS gives them equal weight to an isolated, distantly related species. PGLS corrects for this by treating them as less independent.

What you actually do in R

In practice, you use the gls function from the nlme package with a correlation structure from ape:

# Load packages
library("nlme")
library("ape")

# tree: your phylogeny (phylo object)
# data: your data frame with trait columns

model <- gls(trait2 ~ trait1,
data = data,
correlation = corBrownian(phy = tree))

summary(model)

The corBrownian function encodes the phylogenetic covariance matrix. The gls function (generalized least squares) uses this matrix to compute regression coefficients that properly account for phylogenetic non-independence.