Binomial test

The binomial test allows us to evlaute data from experiments when we have some state that has two possible outcomes (e.g., coin flips, sex determination, etc.). To use the binomial test we need to have a null that describes the probability of the two possible outcomes. The most often use of the binomial test is used most often to test for deviations from a 50/50 probability of either outcome.

For our example below we will simulate data that represents whether fish breed with their own species or with a sister species these are called “conspecific” mating or “heterospecific” mating respectively.

# this first one we will simulate where the probability of these
# two types of matings is equal. This means we wouldn't generally
# expect to see a signficant result.
mat.res <- sample(c("conspecific", "heterospecific"), 250, replace=T)

# now we simulate with a very slight difference in the probability
# of outcomes. Such a small difference in fact that it may be 
# fairly common to get a non-significant result despite the fact
# that it shoud be significant.
mat.res2 <- sample(c("conspecific", "heterospecific"), 250,
                   replace=T, prob = c(.45, .55))

# now we will increase the sample size to illustrate how this 
# impacts our results
mat.res3 <- sample(c("conspecific", "heterospecific"), 2500,
                   replace=T, prob = c(.45, .55))

binom.test(x=sum(mat.res == "conspecific"), n=length(mat.res))
## 
##  Exact binomial test
## 
## data:  sum(mat.res == "conspecific") and length(mat.res)
## number of successes = 109, number of trials = 250, p-value = 0.04971
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.3736152 0.4999205
## sample estimates:
## probability of success 
##                  0.436
binom.test(x=sum(mat.res2 == "conspecific"), n=length(mat.res2))
## 
##  Exact binomial test
## 
## data:  sum(mat.res2 == "conspecific") and length(mat.res2)
## number of successes = 109, number of trials = 250, p-value = 0.04971
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.3736152 0.4999205
## sample estimates:
## probability of success 
##                  0.436
binom.test(x=sum(mat.res3 == "conspecific"), n=length(mat.res3))
## 
##  Exact binomial test
## 
## data:  sum(mat.res3 == "conspecific") and length(mat.res3)
## number of successes = 1177, number of trials = 2500, p-value = 0.003723
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.4510777 0.4905907
## sample estimates:
## probability of success 
##                 0.4708

Chi-Square test

Similar to the binomial test the Chi-Square test deals with data that has discrete outcomes. The difference is that now we have two different discrete variables. We use this test when we are interested in determining whether different groups have different probabilites associated with the variable we are measuring. For this example we will simulate data for whether sausage rolls are spicy or not spicy in the twin cities (a Minnesota city in the northern US) and in Austin Texas.

# create a matrix to hold our data
kolachesearch <- matrix(data = NA, nrow = 2, ncol = 2)

# naming the columsn and rows (this is always a good idea)
colnames(kolachesearch) <- c("notspicy", "spicy")
rownames(kolachesearch) <- c("twincities", "austin")

# spicy probs in twin cities and austin
probs <- c(.2, .6)

# work through the rows of our table creating data
for(i in 1:nrow(kolachesearch)){
  # simulating trying out kolaches in our city of choice
  # and classifying them as spicy or not
  counts <- table(sample(c("notspicy", "spicy"), 
                         60, replace=T, 
                         prob=c(1-probs[i], probs[i])))
  # recording our dining experience in the current city
  kolachesearch[i,] <- counts
}

chisq.test(kolachesearch)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  kolachesearch
## X-squared = 16.694, df = 1, p-value = 4.391e-05

T-test

The T-test is our first test that focuses on continuous data. We will look at several versions:

One sample T-test: Does the mean of my sample differ from some value that I hypothesize?

Two sample T-test: Do the means of the two samples that I have differ.

Paired sample T-test: Does an intervention that I have performed change the value of a variable I measure before and after my intervention.

# lets simulate a whole lake of fish
lake <- rexp(10000)

# now lets sample 150 fish and see if the mean is different from
# 1; one is the expected mean.
t.test(x=sample(lake, 150), mu=1)
## 
##  One Sample t-test
## 
## data:  sample(lake, 150)
## t = -1.312, df = 149, p-value = 0.1915
## alternative hypothesis: true mean is not equal to 1
## 95 percent confidence interval:
##  0.7643553 1.0475908
## sample estimates:
## mean of x 
##  0.905973
# now we will take two samples and compare them.
samp1 <- sample(lake, 150)
samp2 <- sample(lake, 150)
t.test(x=samp1, y=samp2)
## 
##  Welch Two Sample t-test
## 
## data:  samp1 and samp2
## t = 0.28327, df = 278.85, p-value = 0.7772
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.2051909  0.2741724
## sample estimates:
## mean of x mean of y 
## 0.9684642 0.9339734
# to do the paired sample t-test we need to do things a bit
# differently. Lets imagine that we measure cortisol levels
# in fish before or after an interaction that may be stressful.
cort.level1 <- rnorm(100, mean=200, sd=10)
cort.level2 <- c()
for(i in 1:100){
  cort.level2[i] <- cort.level1[i] + rnorm(1, mean=30, sd=20)
}
t.test(x=cort.level1, cort.level2, paired=T)
## 
##  Paired t-test
## 
## data:  cort.level1 and cort.level2
## t = -15.294, df = 99, p-value < 2.2e-16
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -32.22024 -24.81977
## sample estimates:
## mean difference 
##       -28.52001