Midterm

Section A

Generating Data.

library(ggplot2)
set.seed(1)
n <- 30
grpA <- rnorm(n = n, mean = 10, sd = 2)
grpB <- rlnorm(n = n, meanlog = 2.3, sdlog=0.3)
grp <- factor(rep(c("A","B"), each = n))
y <- c(grpA, grpB)
df <- data.frame(y, grp)
summary(df)

##        y          grp   
##  Min.   : 5.571   A:30  
##  1st Qu.: 8.916   B:30  
##  Median :10.258         
##  Mean   :10.419         
##  3rd Qu.:11.679         
##  Max.   :18.068

ggplot(df, aes(x = grp, y = y)) +
  geom_violin(fill = "lightgray", color = "black", trim = FALSE) +
  geom_jitter(width = 0.15, alpha = 0.6, color = "steelblue", size = 1.5) +
  theme_minimal(base_size = 14) +
  labs(x = NULL, y = "Y value")

Section B

In this section I assess the degree to which my data meet the assumptions of the tests that we might be doing (normality).

par(mfcol=c(2,2))
qqnorm(grpA)
qqline(grpA)
qqnorm(grpB)
qqline(grpB)
plot(density(grpA))
plot(density(grpB))

shapiro.test(grpA)

## 
##  Shapiro-Wilk normality test
## 
## data:  grpA
## W = 0.95011, p-value = 0.1703

shapiro.test(grpB)

## 
##  Shapiro-Wilk normality test
## 
## data:  grpB
## W = 0.9495, p-value = 0.1639

This look like the classic example where our sample size is too small for Shapiro Wilks to reject normality even though we know for a fact that our one group was generated from a non-normal distribution.

Section C

In this section we will discover whether these groups have statistically different means.

t.test(grpA, grpB)

## 
##  Welch Two Sample t-test
## 
## data:  grpA and grpB
## t = -0.86583, df = 51.958, p-value = 0.3906
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.6886323  0.6706605
## sample estimates:
## mean of x mean of y 
##  10.16492  10.67390

wilcox.test(grpA, grpB)

## 
##  Wilcoxon rank sum exact test
## 
## data:  grpA and grpB
## W = 436, p-value = 0.843
## alternative hypothesis: true location shift is not equal to 0

We have no reason to reject a t-test at this point. We didn’t fail to meet an assumption. However, since we know we didn’t generate this data from a normal distribution lets go ahead and do a permutation test for fun.

Finally lets just try out a permutation test. This is always fun anyways.

obsdiff <- mean(grpA)-mean(grpB)
null <- c()
for(i in 1:10000){
  tdat <- sample(c(grpA,grpB))
  null[i] <- mean(tdat[1:30])-mean(tdat[31:60])
}
plot(density(abs(null)))
abline(v=abs(obsdiff), col="darkred", lwd=2)

pval <- sum(abs(null) >= abs(obsdiff))/length(null)
pval

## [1] 0.3921

The results of the permutation test are consistent with the results of the t-test that is good!

Section D

Let make a really beautiful plot of our data:

library(ggplot2)

# Basic violin plot with points
ggplot(df, aes(x = grp, y = y)) +
  geom_violin(trim = FALSE, fill = "lightgray", color = NA) +
  ggbeeswarm::geom_quasirandom(width = 0.2, size = 1.5, alpha = 0.6) +
  theme_bw(base_size = 14) +
  theme(
    #panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    panel.background = element_blank()
  ) +
  labs(x = "Group", y = "Size")

Figure 1. Individual Sizes for members of group A and group B. On the vertical axis we plot size and on the horizontal we have members seperated by group. Groups were not significantly different (p-value = 0.39)

Midterm

Heath Blackmon

2025-09-30

Section A

Section B

Section C

Section D