Data
This directory holds the JSON and CSV data that backs coleoguy.github.io. The site fetches these files at runtime, so updating them updates the site without touching any HTML. The files are also the intended open-data export: everything here is CC BY 4.0 (see LICENSE) and the full manifest is in index.json.
If you are an agent or a script, start with index.json. It lists every file with a description, format, record count, the page it backs, and a primary publication when one exists.
Files
Lab state
| File | Used by | Records | Refresh |
|---|---|---|---|
team.json |
team.html |
20 | edit by hand when members join or leave |
alumni.json |
lineage.html |
21 | edit by hand when alumni update positions |
news.json |
news.html + homepage feed |
12 | edit by hand |
lab-status.json |
homepage live-activity strip | 4 signals | edit by hand weekly |
undergrad-papers.json |
undergrad-papers.html |
8 | add as undergrad-led papers publish |
prompts.json |
subpages/Prompting.html |
7 | pull from github.com/coleoguy/prompting |
publications.json |
publications.html |
57 (snapshot) | re-snapshot from ORCID (script below) |
voice.md |
agent context — any draft Heath will sign | — | regenerate from an expanded lead-authored corpus |
Databases
| File | Used by | Records | Primary publication |
|---|---|---|---|
cures-karyotype-database.csv |
cures-karyotype-database.html |
63,542 | Copeland et al. 2026, bioRxiv |
cures-karyotype-data.json |
cures-karyotype-database.html |
63,542 | (JSON form of the CSV above) |
tau-database.json |
tau_database.html |
1,960 | re-export from subpages/tau-data/results.csv |
epistasis-database.csv + epistasis-database.json |
epistasis-database.html |
1,606 | Burch et al. 2024, Evolution |
epistasis-database-citations.csv |
— | 128 | primary sources for the 1,606 crosses |
karyotypes-coleoptera.csv |
karyotypes/index.html |
4,959 | Blackmon & Demuth 2015 |
karyotypes-diptera.csv |
karyotypes/index.html |
3,474 | Morelli, Blackmon & Hjelmen |
karyotypes-amphibia.csv |
karyotypes/index.html |
2,124 | Perkins et al. 2019 |
karyotypes-mammalia.csv |
karyotypes/index.html |
1,440 | Blackmon Lab, curated |
karyotypes-drosophila.csv |
karyotypes/index.html |
1,247 | Morelli, Blackmon & Hjelmen |
karyotypes-polyneoptera.csv |
karyotypes/index.html |
823 | Sylvester et al. 2020 |
karyotypes-coleoptera-citations.csv |
— | 251 | primary sources for beetle records |
karyotypes-six-index.json |
— | — | manifest for the six clade CSVs |
Manifest and licensing
| File | Purpose |
|---|---|
index.json |
Full machine-readable manifest of every file in this directory |
LICENSE |
CC BY 4.0 license text and attribution guidance |
Refreshing the publications snapshot
The publications page tries data/publications.json first and falls back to a live ORCID API call. The snapshot keeps the page fast and resilient when ORCID is slow or down. To regenerate:
cd ~/Desktop/GitHub/coleoguy.github.io
python3 << 'EOF'
import json, urllib.request, time
ORCID = "0000-0002-5433-4036"
BASE = f"https://pub.orcid.org/v3.0/{ORCID}"
def fetch(url):
req = urllib.request.Request(url, headers={'Accept': 'application/json'})
with urllib.request.urlopen(req, timeout=30) as r: return json.loads(r.read())
data = fetch(f"{BASE}/works")
ALLOWED = {'journal-article', 'book', 'book-chapter', 'preprint'}
works, put_codes = [], []
for g in data.get('group', []):
s = g.get('work-summary', [{}])[0] or {}
pc = s.get('put-code')
title = (s.get('title') or {}).get('title', {}).get('value', '')
journal = (s.get('journal-title') or {}).get('value', '')
year = (s.get('publication-date') or {}).get('year', {}).get('value', '')
typ = s.get('type', '')
doi = ''; doi_url = ''
for eid in (s.get('external-ids') or {}).get('external-id', []):
if eid.get('external-id-type') == 'doi':
doi = eid.get('external-id-value', '')
doi_url = (eid.get('external-id-url') or {}).get('value') or f'https://doi.org/{doi}'
break
if not title or typ not in ALLOWED: continue
works.append({'putCode': pc, 'title': title, 'journal': journal, 'year': year,
'doi': doi, 'doiUrl': doi_url, 'type': typ, 'authors': ''})
put_codes.append(pc)
authors = {}
for i in range(0, len(put_codes), 25):
batch = put_codes[i:i+25]
b = fetch(f"{BASE}/works/{','.join(str(p) for p in batch)}")
for entry in b.get('bulk', []):
w = entry.get('work', {})
contribs = (w.get('contributors') or {}).get('contributor', [])
names = [(c.get('credit-name') or {}).get('value') for c in contribs
if (c.get('contributor-attributes') or {}).get('contributor-role') == 'author']
authors[w.get('put-code')] = ', '.join(n for n in names if n)
time.sleep(0.2)
for w in works:
if w['putCode'] in authors: w['authors'] = authors[w['putCode']]
with open('data/publications.json', 'w') as f:
json.dump({'orcid': ORCID, 'fetched_at': time.strftime('%Y-%m-%dT%H:%M:%SZ', time.gmtime()), 'works': works}, f, indent=2, ensure_ascii=False)
print(f"Wrote {len(works)} works to data/publications.json")
EOF
cures-karyotype-database.csv
Chromosome number records collected through Course-based Undergraduate Research Experiences (CUREs) at Texas A&M University. Accompanies the interactive browser at coleoguy.github.io/cures-karyotype-database.html.
- Rows: 63,542
- Clades: 56
- Columns:
clade— clade name (e.g. Accipitriformes, Fabaceae)species— species binomialhaploid_number— haploid chromosome number (n)citation— source of karyotype data for the clade (one citation per clade)
Citations are clade-level: every species within a clade shares the citation listed in the “Karyotype Source” column of the project tracking sheet. Phylogeny sources and per-clade student leads are available from the lab on request.
Citation
If you use this dataset, please cite:
Copeland, M., McConnell, M., Barboza, A., Abraham, H.M., Alfieri, J., Arackal, S., Bernard, C.E., Bryant, K., Cast, S., Chien, S., Clark, E., Cruz, C.E., Diaz, A.Y., Deiterman, O., Girish, R., Harper, K., Hjelmen, C.E., Thompson, M.J., Koehl, R., Koneru, T., Laird, K., Lee, Y., Lopez, V.R., Murphy, M., Perez, N., Schmalz, S., Sylvester, T., and Blackmon, H. (2026). Dismantling Chromosomal Stasis Across the Eukaryotic Tree of Life. bioRxiv 2026.04.14.718287. https://doi.org/10.64898/2026.04.14.718287
Please also cite the original karyotype sources listed in the citation column.
@article{Copeland2026.04.14.718287,
author = {Copeland, Megan and McConnell, Meghann and Barboza, Andres and Abraham, Hannah M and Alfieri, James and Arackal, Steven and Bernard, Carrie E and Bryant, Kiedon and Cast, Shelbie and Chien, Sean and Clark, Emily and Cruz, Cassandra E and Diaz, Aileen Y and Deiterman, Olivia and Girish, Riya and Harper, Kaya and Hjelmen, Carl E and Thompson, Michelle J and Koehl, Rachel and Koneru, Tanvi and Laird, Kenzie and Lee, Yoonseo and Lopez, Virginia R and Murphy, Mallory and Perez, Nayeli and Schmalz, Sarah and Sylvester, Terrence and Blackmon, Heath},
title = {Dismantling Chromosomal Stasis Across the Eukaryotic Tree of Life},
year = {2026},
doi = {10.64898/2026.04.14.718287},
publisher = {Cold Spring Harbor Laboratory},
journal = {bioRxiv},
elocation-id = {2026.04.14.718287},
url = {https://www.biorxiv.org/content/early/2026/04/16/2026.04.14.718287}
}
epistasis-database.csv
Line-cross datasets from plants and animals analyzed in Burch et al. 2024 with the SAGA2 information-theoretic framework. One row per dataset; see epistasis-database.json for the full schema and the 128 source citations in epistasis-database-citations.csv.
If you use this dataset, please cite:
Burch, B.D., Alexander, E.P., Fu, Y., and Blackmon, H. (2024). Information theoretic line-cross analysis and the evidence for pervasive epistasis. Evolution 78(4): 624-634. https://doi.org/10.1093/evolut/qpae003
Six karyotype databases
The files karyotypes-coleoptera.csv, karyotypes-diptera.csv, karyotypes-amphibia.csv, karyotypes-mammalia.csv, karyotypes-drosophila.csv, and karyotypes-polyneoptera.csv are the raw CSVs behind the clade cards on coleoguy.github.io/karyotypes/. Each CSV has its own column set because the databases were built at different times for different projects; karyotypes-six-index.json lists them in one place with columns and primary publication for each. Coleoptera ships with a separate karyotypes-coleoptera-citations.csv file of primary sources.
voice.md
A portable style calibration of Heath Blackmon’s scientific-writing voice, distilled from seven lead-authored papers (four Genetics / Evolution / J Heredity articles, a sole-authored newsletter piece, a first-author review, and one mentee-led paper as a lab-norm signal). Intended to be loaded as context by any agent drafting prose that Heath will sign: manuscripts, cover letters, grant sections, lab-site pages.
The file groups rules by scale (sentence, paragraph, section, word) and anchors every rule to verbatim passages from the source papers so that an agent pattern-matches on real examples rather than on an abstract summary. It also names explicit anti-patterns — rhetorical moves that are absent from the corpus and that a generic “good scientific writing” guide would otherwise incorrectly suggest.
Not appropriate for: prose another author is signing, editing someone else’s work, or lab-site prose that needs the compressed declarative mode of the hand-written HTML pages.
Regenerate when the lead-authored corpus expands or when recent writing diverges from the rules in the file.
License
All files in this directory are released under CC BY 4.0. See LICENSE for the full terms and a short note on how to attribute.