Home › Databases › CUREs karyotypes

The CUREs karyotype database.

63,682 chromosome-number records spanning 55 eukaryotic clades, each tied to its primary source. The dataset was assembled over several years by rolling cohorts of undergraduate researchers in our CUREs program, and it ships as open JSON and CSV so other labs and AI agents can build on it without scraping. This page is the data backing the 2026 preprint.

63,682

Karyotype records

Eukaryotic clades

CUREs41 records added this month by undergraduate researchers RepositoryOpen JSON and CSV, citations preserved on every row PreprintCopeland, McConnell, Barboza et al. 2026, bioRxiv

How to use the data

Every record in this database comes from a primary source: a paper, dissertation, or dataset whose authors did the slow, careful work of generating these chromosome counts. Citations matter. They are how careers get evaluated, how grants get awarded, and how the people whose work we depend on get credit. Please cite responsibly.

Using data from a single clade? Cite the original source for that clade (see the Sources tab, or the citation column in the data table). Do not cite this database in place of the primary work.
Combining data across multiple clades? Cite this database (Copeland et al. 2026) and list the clade-level sources you drew on in your supplementary materials.
Downloading the full dataset? The CSV file ships with a citation column for exactly this reason. Please carry that column through your analyses so the original authors can be credited downstream.

How to cite

Copeland, M., McConnell, M., Barboza, A., Abraham, H.M., Alfieri, J., Arackal, S., Bernard, C.E., Bryant, K., Cast, S., Chien, S., Clark, E., Cruz, C.E., Diaz, A.Y., Deiterman, O., Girish, R., Harper, K., Hjelmen, C.E., Thompson, M.J., Koehl, R., Koneru, T., Laird, K., Lee, Y., Lopez, V.R., Murphy, M., Perez, N., Schmalz, S., Sylvester, T., and Blackmon, H. (2026). Dismantling Chromosomal Stasis Across the Eukaryotic Tree of Life. bioRxiv 2026.04.14.718287. https://doi.org/10.64898/2026.04.14.718287

@article{Copeland2026.04.14.718287,
  author    = {Copeland, Megan and McConnell, Meghann and Barboza, Andres and Abraham, Hannah M and Alfieri, James and Arackal, Steven and Bernard, Carrie E and Bryant, Kiedon and Cast, Shelbie and Chien, Sean and Clark, Emily and Cruz, Cassandra E and Diaz, Aileen Y and Deiterman, Olivia and Girish, Riya and Harper, Kaya and Hjelmen, Carl E and Thompson, Michelle J and Koehl, Rachel and Koneru, Tanvi and Laird, Kenzie and Lee, Yoonseo and Lopez, Virginia R and Murphy, Mallory and Perez, Nayeli and Schmalz, Sarah and Sylvester, Terrence and Blackmon, Heath},
  title     = {Dismantling Chromosomal Stasis Across the Eukaryotic Tree of Life},
  year      = {2026},
  doi       = {10.64898/2026.04.14.718287},
  publisher = {Cold Spring Harbor Laboratory},
  journal   = {bioRxiv},
  elocation-id = {2026.04.14.718287},
  url       = {https://www.biorxiv.org/content/early/2026/04/16/2026.04.14.718287}
}

Browse the data

The interactive viewer below loads the full dataset and lets you filter by clade, search for species, sort any column, download the filtered subset as CSV, and plot distributions. All 63,682 records are available through the Data Table tab; source citations for each clade live on the Sources tab.

Overview

Sources by Clade

Data Table

Plot

63,682

Total Records

Unique Species

Clades

Median Haploid Number

Haploid Range

Records by Clade

How to Cite, Please Read

Using data from a single clade? Cite the original source for that clade (see the Sources by Clade tab, or the citation column in the data table). Do not cite this database in place of the primary work.
Combining data across multiple clades? Cite this database (Copeland et al. 2026) and list the clade-level sources you drew on in your supplementary materials.
Downloading the full dataset? The CSV file ships with a citation column for exactly this reason. Please carry that column through your analyses so the original authors can be credited downstream.

Each clade below is backed by a single primary source. If your work uses data from one clade only, please cite that source directly. Those authors generated the counts you are using. Cite the database paper (Copeland et al. 2026) only when your analysis spans multiple clades.

Loading sources…

⬇ Full dataset (CSV)

Clade	Species	Haploid Number	Citation
Loading data…

Plot type:

About the CUREs program

This database is one output of the Biology & AI CURE at Texas A&M, a course-based research program that embeds undergraduates in real research from their first semester. The workflow that produced this dataset was genuinely collaborative between students, AI, and human experts.

Students used AI tools to locate candidate records in the primary literature, then evaluated each one for appropriateness, checking that the source was a credible cytogenetic study and that the count was unambiguous. Faculty and graduate students independently reviewed the curated records, providing a second pass over every entry before it entered the database. On the computation side, students used AI to help write and debug parsing scripts. Over the arc of the course, that work converged on a single script capable of handling every dataset format in the collection. The faculty and lead author independently developed their own analysis scripts and ran them in parallel, confirming that both approaches converged on the same answers. The result is a dataset no single graduate student could have compiled in a reasonable timeframe, built by people who were learning the biology and the tooling at the same time.

Read about the CURE program →