Use natural‑language prompts with an AI assistant (like ChatGPT) to produce clean, runnable R code in RStudio for common statistics and plots—without over‑emphasizing syntax. You’ll see prompt patterns, checklists, and ready‑to‑run examples.
ggplot2
” or
“base R only.”Copy‑me template:
I’m in RStudio. I have a data frame
DF
with columns:y
(numeric),x
(numeric),group
(factor). Make a scatterplot ofy
vsx
, color bygroup
, add loess smoothers, and nice axis labels. Useggplot2
. Give me only runnable R code with comments.
The examples below use base R and ggplot2.
if (!requireNamespace("ggplot2", quietly = TRUE)) install.packages("ggplot2")
library(ggplot2)
Good prompt:
“Read
data/my_study.csv
, show first rows, column names, and a basic summary. Report missing values by column. Code only.”
path <- "data/my_study.csv"
if (file.exists(path)) {
dat <- read.csv(path, stringsAsFactors = FALSE)
cat("Rows x Cols:", nrow(dat), "x", ncol(dat), "\n")
cat("Column names:\n"); print(names(dat))
cat("\nMissing values per column:\n"); print(colSums(is.na(dat)))
cat("\nSummary:\n"); print(summary(dat))
} else {
cat("Demo mode: file not found; using built-in 'mtcars'.\n")
dat <- mtcars
dat$cyl <- factor(dat$cyl)
print(head(dat))
}
## Demo mode: file not found; using built-in 'mtcars'.
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Good prompt:
“Recode
group
to factor, drop rows with missingy
orx
, createlog_y = log(y)
. Base R only, with comments.”
df <- dat
if ("cyl" %in% names(df)) df$cyl <- factor(df$cyl) # demo grouping
df <- df[!is.na(df$mpg) & !is.na(df$wt), ] # demo: keep complete rows
df$log_mpg <- log(df$mpg)
summary(df$log_mpg)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.342 2.736 2.955 2.958 3.127 3.523
Good prompt:
“Scatterplot
mpg
vswt
, color bycyl
, smooth per group, minimal theme, clear labels.”
ggplot(df, aes(x = wt, y = mpg, color = cyl)) +
geom_point(alpha = 0.8, size = 2.5) +
geom_smooth(se = FALSE) +
labs(
title = "Fuel Efficiency vs Weight",
x = "Weight (1000 lbs)",
y = "Miles per Gallon",
color = "Cylinders"
) +
theme_minimal(base_size = 14)
Good prompt:
“Compare
mpg
between 4‑ and 6‑cyl. Welch’s t‑test, Cohen’s d, one‑sentence interpretation.”
sub <- df[df$cyl %in% c("4", "6"), ]
x <- sub$mpg[sub$cyl == "4"]; y <- sub$mpg[sub$cyl == "6"]
tt <- t.test(x, y) # Welch by default
d <- (mean(x) - mean(y)) / sqrt((sd(x)^2 + sd(y)^2)/2)
tt
##
## Welch Two Sample t-test
##
## data: x and y
## t = 4.7191, df = 12.956, p-value = 0.0004048
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 3.751376 10.090182
## sample estimates:
## mean of x mean of y
## 26.66364 19.74286
cat(sprintf("Cohen's d = %.2f\n", d))
## Cohen's d = 2.07
cat(sprintf("Interpretation: Mean mpg differs between 4- and 6-cylinder cars (t=%.2f, p=%.3f); effect size d=%.2f.\n",
unname(tt$statistic), tt$p.value, d))
## Interpretation: Mean mpg differs between 4- and 6-cylinder cars (t=4.72, p=0.000); effect size d=2.07.
Good prompt:
“ANOVA
mpg ~ cyl
, residual plot + QQ plot + Shapiro, Tukey post‑hoc if significant. One‑line takeaway.”
fit <- aov(mpg ~ cyl, data = df)
summary(fit)
## Df Sum Sq Mean Sq F value Pr(>F)
## cyl 2 824.8 412.4 39.7 4.98e-09 ***
## Residuals 29 301.3 10.4
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
par(mfrow = c(1, 2))
plot(fitted(fit), resid(fit), xlab="Fitted", ylab="Residuals", main="Residuals vs Fitted"); abline(h=0, lty=2)
qqnorm(resid(fit)); qqline(resid(fit))
par(mfrow = c(1, 1))
shapiro <- shapiro.test(resid(fit)); shapiro
##
## Shapiro-Wilk normality test
##
## data: resid(fit)
## W = 0.97065, p-value = 0.5177
p_anova <- summary(fit)[[1]]["cyl","Pr(>F)"]
if (!is.na(p_anova) && p_anova < 0.05) {
tk <- TukeyHSD(fit, "cyl"); print(tk)
cat("Takeaway: Cylinders explain mpg differences; see Tukey pairs.\n")
} else {
cat("Takeaway: No strong evidence that mpg differs by cylinders.\n")
}
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = mpg ~ cyl, data = df)
##
## $cyl
## diff lwr upr p adj
## 6-4 -6.920779 -10.769350 -3.0722086 0.0003424
## 8-4 -11.563636 -14.770779 -8.3564942 0.0000000
## 8-6 -4.642857 -8.327583 -0.9581313 0.0112287
##
## Takeaway: Cylinders explain mpg differences; see Tukey pairs.
Good prompt:
“Fit
mpg ~ wt + hp
. Report standardized betas and R². One‑sentence interpretation.”
Z <- scale(df[, c("wt", "hp")])
fit_lm <- lm(df$mpg ~ Z[, "wt"] + Z[, "hp"])
summary(fit_lm)
##
## Call:
## lm(formula = df$mpg ~ Z[, "wt"] + Z[, "hp"])
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.941 -1.600 -0.182 1.050 5.854
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 20.0906 0.4585 43.822 < 2e-16 ***
## Z[, "wt"] -3.7943 0.6191 -6.129 1.12e-06 ***
## Z[, "hp"] -2.1784 0.6191 -3.519 0.00145 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.593 on 29 degrees of freedom
## Multiple R-squared: 0.8268, Adjusted R-squared: 0.8148
## F-statistic: 69.21 on 2 and 29 DF, p-value: 9.109e-12
cat("Interpretation: Holding the other predictor constant, a 1 SD increase in a predictor changes mpg by its standardized beta.\n")
## Interpretation: Holding the other predictor constant, a 1 SD increase in a predictor changes mpg by its standardized beta.
Explain + code
“Act as a stats tutor. In RStudio with
DF(col1 numeric, col2 factor)
, run a Welch’s t‑test comparingcol1
acrosscol2
groups, include comments, and explain the output in 2 sentences.”
Plot with constraints
“Use base R only to draw side‑by‑side boxplots of
y
bygroup
fromDF
. Add a title and axis labels. Code only.”
Reproduce & compare
“Give two solutions to plot
y ~ x
with a smooth: one using base R and one using ggplot2. Label axes and set a white background.”
Troubleshoot
“I get
object 'Sepal.Length' not found
. Here is my code: [paste]. Explain the error and give a fixed version.”
iris
— sepal/petal by species.
Load: data(iris); head(iris)
mtcars
— car performance stats.
Load: data(mtcars); head(mtcars)
ToothGrowth
— tooth length by
supplement & dose. Load:
data(ToothGrowth); head(ToothGrowth)
supp
. Plot (violin/boxplot), Welch’s t‑test, effect size,
one‑sentence interpretation.Sepal.Length
differs by Species
. Diagnostics +
Tukey; plot means with 95% CIs.mpg ~ wt + hp
. Report standardized betas, R², and a
partial‑effect plot for wt
controlling for
hp
.Prompt to paste into AI:
I’m in RStudio. I have
DF
withy
(numeric outcome),x
(numeric predictor),grp
(factor). Please produce only runnable R code that:
- Removes rows with missing
y
orx
.- Makes a scatterplot of
y
vsx
, color bygrp
, with a smooth per group.- Fits
y ~ x + grp
and reports coefficients and R².- Uses ggplot2, includes comments, runs as‑is.
DF <- mtcars; DF$grp <- factor(DF$cyl) # replace with your data
DF <- DF[!is.na(DF$mpg) & !is.na(DF$wt), ]
ggplot(DF, aes(wt, mpg, color = grp)) +
geom_point(alpha = 0.8) +
geom_smooth(se = FALSE) +
labs(x = "Weight (1000 lbs)", y = "Miles per Gallon", color = "Cylinders") +
theme_minimal(base_size = 14)
fit <- lm(mpg ~ wt + grp, data = DF)
summary(fit)
##
## Call:
## lm(formula = mpg ~ wt + grp, data = DF)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.5890 -1.2357 -0.5159 1.3845 5.7915
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.9908 1.8878 18.006 < 2e-16 ***
## wt -3.2056 0.7539 -4.252 0.000213 ***
## grp6 -4.2556 1.3861 -3.070 0.004718 **
## grp8 -6.0709 1.6523 -3.674 0.000999 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.557 on 28 degrees of freedom
## Multiple R-squared: 0.8374, Adjusted R-squared: 0.82
## F-statistic: 48.08 on 3 and 28 DF, p-value: 3.594e-11
End.