Home › Databases › CUREs karyotypes

The CUREs karyotype database.

63,682 chromosome-number records spanning 55 eukaryotic clades, each tied to its primary source. The dataset was assembled over several years by rolling cohorts of undergraduate researchers in our CUREs program, and it ships as open JSON and CSV so other labs and AI agents can build on it without scraping. This page is the data backing the 2026 preprint.

63,682
Karyotype records
55
Eukaryotic clades
CUREs41 records added this month by undergraduate researchers RepositoryOpen JSON and CSV, citations preserved on every row PreprintCopeland, McConnell, Barboza et al. 2026, bioRxiv

How to use the data

Every record in this database comes from a primary source: a paper, dissertation, or dataset whose authors did the slow, careful work of generating these chromosome counts. Citations matter. They are how careers get evaluated, how grants get awarded, and how the people whose work we depend on get credit. Please cite responsibly.

How to cite

Copeland, M., McConnell, M., Barboza, A., Abraham, H.M., Alfieri, J., Arackal, S., Bernard, C.E., Bryant, K., Cast, S., Chien, S., Clark, E., Cruz, C.E., Diaz, A.Y., Deiterman, O., Girish, R., Harper, K., Hjelmen, C.E., Thompson, M.J., Koehl, R., Koneru, T., Laird, K., Lee, Y., Lopez, V.R., Murphy, M., Perez, N., Schmalz, S., Sylvester, T., and Blackmon, H. (2026). Dismantling Chromosomal Stasis Across the Eukaryotic Tree of Life. bioRxiv 2026.04.14.718287. https://doi.org/10.64898/2026.04.14.718287

Browse the data

The interactive viewer below loads the full dataset and lets you filter by clade, search for species, sort any column, download the filtered subset as CSV, and plot distributions. All 63,682 records are available through the Data Table tab; source citations for each clade live on the Sources tab.

63,682
matching records · 55 clades

Clade

Search species

GitHub Repository for This Database bioRxiv Preprint
Overview
Sources by Clade
Data Table
Plot
63,682
Total Records
0
Unique Species
55
Clades
0
Median Haploid Number
0
Haploid Range

Records by Clade

How to Cite, Please Read

Every record in this database comes from a primary source: a paper, dissertation, or dataset whose authors did the slow, careful work of generating these chromosome counts. Citations matter. They are how careers get evaluated, how grants get awarded, and how the people whose work we depend on get credit. Please cite responsibly.
  • Using data from a single clade? Cite the original source for that clade (see the Sources by Clade tab, or the citation column in the data table). Do not cite this database in place of the primary work.
  • Combining data across multiple clades? Cite this database (Copeland et al. 2026) and list the clade-level sources you drew on in your supplementary materials.
  • Downloading the full dataset? The CSV file ships with a citation column for exactly this reason. Please carry that column through your analyses so the original authors can be credited downstream.
Each clade below is backed by a single primary source. If your work uses data from one clade only, please cite that source directly. Those authors generated the counts you are using. Cite the database paper (Copeland et al. 2026) only when your analysis spans multiple clades.
Loading sources…
⬇ Full dataset (CSV)
Clade Species Haploid Number Citation
Loading data…

About the CUREs program

This database is one output of the Biology & AI CURE at Texas A&M, a course-based research program that embeds undergraduates in real research from their first semester. The workflow that produced this dataset was genuinely collaborative between students, AI, and human experts.

Students used AI tools to locate candidate records in the primary literature, then evaluated each one for appropriateness, checking that the source was a credible cytogenetic study and that the count was unambiguous. Faculty and graduate students independently reviewed the curated records, providing a second pass over every entry before it entered the database. On the computation side, students used AI to help write and debug parsing scripts. Over the arc of the course, that work converged on a single script capable of handling every dataset format in the collection. The faculty and lead author independently developed their own analysis scripts and ran them in parallel, confirming that both approaches converged on the same answers. The result is a dataset no single graduate student could have compiled in a reasonable timeframe, built by people who were learning the biology and the tooling at the same time.

Read about the CURE program →

Question copied. Paste it into the NotebookLM tab.