Home › Tools › PCM guide › Discrete + continuous

Discrete plus continuous: mixed trait models.

Many biological questions require both data types at once. Does having an XY sex determination system correlate with chromosome number? Does the presence of a feature change the rate at which a continuous character evolves? This page covers phylogenetic ANCOVA, Pagel's lambda, and the threshold model that bridges discrete and continuous thinking.

Why you need both

Many evolutionary questions require analyzing both discrete and continuous traits simultaneously. For example: does having an XY sex determination system correlate with chromosome number? Do species with wings have larger body sizes? Does the presence of a trait relate to the rate at which another trait evolves?

These are inherently multivariate problems. You cannot analyze them by looking at each trait independently and ignoring the phylogenetic structure. You need methods that can handle mixed data types while accounting for shared ancestry.

PGLS with a discrete predictor

The simplest case is when your predictor is discrete (e.g., "has trait X" or "doesn't have trait X") and your response is continuous (e.g., body size). This is essentially a phylogenetic ANCOVA (analysis of covariance).

Below is an interactive plot. Start with the naive regression (OLS), which ignores the tree. Then reveal the phylogeny. Notice how the phylogenetic structure influences the relationship. Finally, show the PGLS line, which corrects for non-independence.

What's happening: The species are grouped by a discrete trait (Group A: green, Group B: orange). Without the phylogeny, you see two distinct groups. But when you reveal the tree, you notice that all Group A species are clustered in one part of the tree. PGLS accounts for this structure and gives different regression slopes for each group.

Phylogenetic ANCOVA: comparing groups

When you test whether a continuous trait differs between two groups (accounting for phylogeny), you are doing a phylogenetic ANCOVA. The comparison is not a simple t-test. Instead, you fit separate regression lines for each group and test whether they differ in intercept, slope, or both.

Naive ANCOVA (wrong)

Treats groups as independent. Ignores phylogenetic relationships within groups.

Phylogenetic ANCOVA (correct)

Uses phylogenetic covariance matrix. Accounts for non-independence within groups.

Pagel's lambda and phylogenetic correlation

In the examples above, we have assumed Brownian motion evolution. But sometimes traits do not show strong phylogenetic signal. One way to model this is with Pagel's lambda, a scaling parameter that ranges from 0 (no phylogenetic signal) to 1 (strong signal under BM).

Lambda transforms the phylogenetic covariance matrix by scaling all off-diagonal elements (the shared variance between pairs of species due to common ancestry) by a factor λ while leaving the diagonal (each species's total variance) unchanged. Concretely, if V is the BM-expected covariance matrix, the λ-transformed matrix is V_λ = λV + (1−λ)·diag(V). At λ = 0, off-diagonals are zero and species are independent (a star phylogeny in covariance terms); at λ = 1, you recover the standard BM expectation. λ is fit by maximum likelihood, so it doubles as a formal test of phylogenetic signal: a confidence interval that excludes 0 means signal is detected; one that excludes 1 means evolution is slower-than-BM at the tips. Below is a heatmap showing how the correlations between species change as lambda varies.

Pagel's lambda

0.50

Lambda = 0 (left) indicates no phylogenetic signal. Lambda = 1 (right) indicates Brownian motion expectation. Brighter colors = stronger correlations between species.

The threshold model

A conceptually important model bridges the gap between discrete and continuous evolution. The threshold model, rooted in Wright (1934) and developed for phylogenetic use by Felsenstein (2005, The Comparative Method in Evolutionary Biology) and Felsenstein (2012, Am. Nat. 179: 145–156), proposes that a discrete trait (e.g., the presence or absence of a feature) is determined by an underlying continuous "liability" variable. When liability exceeds a threshold, the discrete trait is expressed.

This model is powerful because it unifies discrete and continuous thinking. The liability evolves continuously (under Brownian motion), but we only observe the discrete outcome. This means that even if a discrete trait appears to have no phylogenetic signal, there could be substantial signal in the underlying liability.

Example: The evolution of wings in insects. You observe either "has wings" or "no wings." But these states might be determined by an underlying developmental liability for wing formation. This liability varies continuously across species and evolves under BM. When liability exceeds a threshold, wings develop.

Putting it all together: a Blackmon lab example

The Blackmon lab studies sex determination systems and sex chromosomes in insects. A great example of a discrete-continuous question: does having an XY sex determination system (discrete) correlate with having more chromosomes (continuous)?

The answer, from the lab's research, is yes. Species with XY systems tend to have higher chromosome numbers. But this relationship is not straightforward. Different insect lineages show different patterns. Some evolved XY systems independently. Some lost them. Some changed chromosome numbers while keeping the same sex determination system.

Analyzing this requires PGLS with a categorical predictor, accounting for the fact that "XY vs non-XY" transitions have happened multiple times in the phylogeny, and each transition might be associated with changes in chromosome number.

Example: Do insects with XY sex determination have different chromosome numbers than species with other sex systems?

The figure shows a hypothetical example. XY species (gold) have higher chromosome numbers than non-XY species (green). A PGLS analysis would test whether this difference is statistically significant after accounting for phylogenetic non-independence. It would estimate separate intercepts for each group and test whether the slopes differ.

What you actually do in R

PGLS with a categorical predictor uses the same gls function as before, but with a factor variable as the predictor:

      # Phylogenetic ANCOVA with gls()

      library("nlme")

      library("ape")

      # data: data frame with continuous trait, categorical predictor

      # predictor must be a factor

      model <- gls(chromosome_number ~ sex_system,

          data = data,

          correlation = corBrownian(phy = tree))

      summary(model)

      anova(model)

For the threshold model, packages like phytools and geiger provide functions for fitting liability distributions and thresholds. The analysis is more complex, but the idea is to estimate the underlying continuous liability that gives rise to discrete states.