Home › Tools › evobiR

evobiR, an R package for evolutionary biology.

A catch-all package that grew out of graduate-school tooling. It bundles Shiny-based teaching apps, a collection of comparative methods, a couple of population-genetics routines, and a set of small utilities that the lab uses regularly. Maintained by Heath Blackmon and Richard A. Adams.

Install

Install from GitHub Released via GitHub using devtools Install the latest version with devtools.

evobiR is released via GitHub. Install it using the devtools package:

library(devtools)
install_github("coleoguy/evobir", build_vignettes = TRUE)

Teaching apps

All evobiR functions designed for teaching are wrapped into Shiny apps. They are available online, and they also ship inside the package so instructors can run them locally or modify the source. Launch them with ViewEvo().

Wright-Fisher simulator Drift and selection, interactively Experiment with allele frequency, population size, fitness, and mutation.
ViewEvo("wf.model")

Students set initial allele frequency, population size, generations, and mutation rate. They choose which genotype or allele to plot and set the fitness of every genotype. The app also shows the expected infinite-population outcome and reports how often each allele is lost or fixed.

Brownian motion simulator A color-blind-friendly BM simulator Generations, replicates, and rate of evolution are all tunable.
ViewEvo("bm.model")

A simple color-blind-friendly Brownian-motion simulation. Students choose the generations, replicates, and rate of evolution. A second interactive panel shows the distribution of outcomes at any selected generation.

Birth-death tree simulator Four random trees side by side Built to help students feel the variance in the birth-death process.
ViewEvo("bd.model")

Plots four random birth-death trees on the same page. Built to help students appreciate the high variance expected under a birth-death model. Users control speciation rate, extinction rate, total simulation time, and whether extinct lineages are pruned.

Statistical distributions Interactive probability density for common priors Helps undergrads learn stats and grad students pick priors.
ViewEvo("dist.model")

Built to help undergraduates grasp basic statistics and to help graduate students pick sensible priors. The user selects among normal, exponential, gamma, logistic, Poisson, or beta distributions and adjusts the parameters. The probability density is plotted in real time.

Comparative methods

AncCond Does the derived state of a binary trait arise at extreme values of a continuous one? Stochastic mapping + ancestral state reconstruction.
data(mite.trait)
data(trees.mite)
AncCond(trees, mite.trait, derived.state = "haplodiploidy", iterations = 100)

AncCond uses stochastic mapping and ancestral state reconstruction to ask whether the derived state of a binary trait tends to appear when a continuous trait sits at an extreme value. Four steps: estimate the continuous trait under Brownian motion, identify transition points in the binary character via stochastic mapping, categorize nodes as ancestral, derived, or producing, and build a null distribution from ancestral-condition node values for comparison to the observed mean at producing nodes.

CDC model Change-dependent character evolution Does trait B evolve faster right after a transition in trait A?

A model for dependent evolution of discrete traits. It uses a hidden state in one binary trait (A vs a) and asks whether transitions in a second binary trait (B vs b) occur primarily right after A to a transitions. Three or four parameters, depending on constraints.

  • rate 1: transitions from A to a1
  • rate 2: transitions from a1 to a2
  • rate 3: transitions from B to b on the A background
  • rate 4: transitions from B to b on the a1 background
  • rate 5: transitions from B to b on the a2 background

Three variants:

  • model 344: rate 4 equals rate 5 (classical Pagel-style correlation)
  • model 343: rate 3 equals rate 5 (change-dependent; bursts in B follow A transitions)
  • model 333: all three B rates equal (B evolution independent of A)

Interpretation via AIC: lowest AIC on 333 means B is independent of A; lowest on 344 means classical Pagel correlation; lowest on 343 means change-dependent evolution, where transitions in A trigger temporary increases in the rate of B.

load(url("http://coleoguy.github.io/comparative/pagel.data.sets.RData"))
tree <- set.phy[[1]]
x    <- unrepburst1[[1]]
y    <- unrepburst2[[1]]
result344 <- cdcModel(x, y, tree, model = 344)
result343 <- cdcModel(x, y, tree, model = 343)
result333 <- cdcModel(x, y, tree, model = 333)
FuzzyMatch Catch near-miss typos between trees and datasets Recover species that would otherwise be dropped.

When datasets come from different sources, typos prevent perfect matching between tree tips and data rows. FuzzyMatch finds close matches so they can be hand-curated, keeping as many species in the analysis as possible.

data(hym.tree)
names <- c("Pepsis_elegans", "Plagiolepis_alluaudi", "Pheidele_lucreti",
           "Meliturgula_scriptifronsi", "Andrena_afimbriat")
FuzzyMatch(tree = hym.tree, data = names, max.dist = 3)
PPSDiscrete Posterior predictive simulation for discrete traits Wraps Bayesian output into simulated datasets for model adequacy checks.

Posterior predictive simulation for discrete traits. Takes a collection of rate matrix estimates from a Bayesian program and simulates new datasets against each matrix on the corresponding tree.

data(trees)
data(mcmc3)
set.seed(1)
result <- evobiR::PPSDiscrete(trees, MCMC = mcmc3,
                              states = c(.33, .33, .34), N = 3)
ReorderData Reorder data rows to match phylo tip order Fixes the single most common source of comparative-methods errors.

Takes a vector, matrix, or dataframe and reorders it to match the tip order of a phylo object. Some R packages require that tips and data share an order, others require matching names, and some require both. Assuming names match, ReorderData aligns the order for you.

data(mite.trait)
data(trees.mite)
new.data <- ReorderData(tree, data, taxa.names = 1)
SampleTrees Thin a posterior sample of trees down to a manageable set Newick or NEXUS output.

Takes a large collection of trees (from MrBayes, BEAST, etc.) and lets the user draw a specified number of random trees and save them in Newick or NEXUS format.

SampleTrees(trees = system.file("trees.nex", package = "evobiR"),
            burnin = .1, final.number = 20,
            format = 'new', prefix = 'sample')
ShowTree Plot a tree with discrete tip data, colorblind-friendly by default Wraps APE's plot and tiplabels with sensible defaults.
data(mite.trait)
data(trees.mite)
ShowTree(tree = trees[[1]], tip.vals = mite.trait[, 3])

Plots a tree and discrete tip data together. Automates the combination of APE's plot and tiplabels functions. By default uses the color-blind-friendly viridis palette and picks maximally distant colors.

SimThresh3 Three-state threshold model An extension of Felsenstein's threshold model to three discrete states.

Extends Felsenstein's threshold model to a three-state unordered trait. Two liabilities evolve on an X and a Y axis, then project onto a Cartesian plane. The plane is divided into three sectors meeting at the origin: 330 to 90 degrees is state 1, 90 to 210 is state 2, 210 to 330 is state 3. The observed discrete trait is determined by the sector the (X, Y) value lands in. This allows any set of transition probabilities between states.

set.seed(3)
tree <- phytools::pbtree(n = 100)
tip.state.full <- SimThresh3(tree, liabilities = TRUE)
SuperMatrix Build a supermatrix from a folder of FASTAs Fills missing loci with user-specified missing symbol.

Reads all FASTA-format alignments in the working directory and builds a single supermatrix including every taxon in any input file. Missing loci for a taxon are filled with the supplied missing symbol. Returns a two-element list: partition data (alignment positions per input file) and a dataframe version of the supermatrix. With save = TRUE both are also written to disk.

SuperMatrix(missing = "N", prefix = "DATASET2", save = TRUE)

Population genetics

CalcD and WinCalcD D-statistic for introgression in genomic data Bootstrapping for unlinked SNPs, jackknifing for linked.

CalcD and CalcPopD implement the D-statistic for detecting introgression in genomic data. Significance comes from bootstrapping (appropriate for unlinked SNPs, for instance unmapped RAD-seq data) or jackknifing (appropriate for linked SNPs, for instance gene alignments or whole-genome alignments).

References:

  • Durand, Eric Y. et al. Testing for ancient admixture between closely related populations. Molecular Biology and Evolution 28.8 (2011): 2239-2252.
  • Eaton, D. A. R. and R. H. Ree. 2013. Inferring phylogeny and introgression using RADseq data: an example from flowering plants (Pedicularis: Orobanchaceae). Syst. Biol. 62: 689-706.

Utilities

AICc Small-sample AIC Converges on AIC as sample size grows, safer in general use.

Given a log likelihood, the number of model parameters, and sample size, returns the small-sample-size version of the AIC. In general use AICc is preferable since it converges on AIC as sample size grows and reduces the risk of overfitting.

data(cars)
fit <- lm(cars)
aic <- AIC(fit)
foo <- vector(length = 100, mode = "numeric")
for (i in 3:100) { foo[i] <- AICc(-127.3877, 3, i) }
Even and Mode Two small helpers Parity check and modal value.

Even returns TRUE if a number is even and FALSE otherwise.

Even(3)   # FALSE
Even(4)   # TRUE

Mode returns the most frequently occurring value in a vector. In the case of a tie it returns the mode with the earliest initial occurrence.

Mode(c(1, 2, 3, 4, 5, 6, 2, 5))
# [1] 2
Mode(c("jeff", "emma", "matt", "laura", "matt"))
# [1] "matt"
SlidingWindow Apply a function inside a moving window of a numeric vector Window and step sizes are user-specified.

Applies a function within a sliding window of a numeric vector. Both step size and window size are controlled by the user. Useful during data exploration when patterns appear at scales different from the raw data. As an example, running a 132-month window over the classic sunspot data reveals the Dalton minimum of the early 1800s, which the raw 11-year cycle obscures.

data(sunspot.month)
foo      <- sunspot.month
sunspots <- SlidingWindow("mean", foo, 132, 12)
years    <- round(SlidingWindow("mean",
                                rep(1749:2013, each = 12)[1:3177], 132, 12))
plot(x = years, y = sunspots, type = "l", lwd = 3)
abline(v = 1810, col = "red", lwd = 3)
ResSel Pick high and low selection lines from residuals of a linear model Originally for horn-size-on-body-size selection experiments.

Takes measurements of multiple traits, runs a linear regression, and identifies records with the largest and smallest residuals. Originally written for regressing horn size on body size to form high and low selection lines. The user chooses the trait to select on, the trait to control for, and the strength of selection (number of individuals retained).

data <- read.csv(file = system.file("horn.beetle.csv", package = "evobiR"))
ResSel(data = data, traits = c(2, 3), percent = 15,
       identifier = 1, model = "linear")

The first column of the data file contains the identifier (specimen or vial ID). The next two columns hold the traits. ResSel returns a list with high line and low line elements containing the selected ID numbers, and produces a visual summary.

Contact

Questions Bugs, feature requests, or questions Email Heath Blackmon.

For questions or comments email Heath Blackmon, or file an issue on the GitHub repository.

Question copied. Paste it into the NotebookLM tab.