Data

This directory holds the JSON and CSV data that backs coleoguy.github.io. The site fetches these files at runtime, so updating them updates the site without touching any HTML. The files are also the intended open-data export: everything here is CC BY 4.0 (see LICENSE) and the full manifest is in index.json.

If you are an agent or a script, start with index.json. It lists every file with a description, format, record count, the page it backs, and a primary publication when one exists.

Files

Lab state

File Used by Records Refresh
team.json team.html 20 edit by hand when members join or leave
alumni.json lineage.html 21 edit by hand when alumni update positions
news.json news.html + homepage feed 12 edit by hand
lab-status.json homepage live-activity strip 4 signals edit by hand weekly
undergrad-papers.json undergrad-papers.html 8 add as undergrad-led papers publish
prompts.json subpages/Prompting.html 7 pull from github.com/coleoguy/prompting
publications.json publications.html 57 (snapshot) re-snapshot from ORCID (script below)
voice.md agent context — any draft Heath will sign regenerate from an expanded lead-authored corpus

Databases

File Used by Records Primary publication
cures-karyotype-database.csv cures-karyotype-database.html 63,542 Copeland et al. 2026, bioRxiv
cures-karyotype-data.json cures-karyotype-database.html 63,542 (JSON form of the CSV above)
tau-database.json tau_database.html 1,960 re-export from subpages/tau-data/results.csv
epistasis-database.csv + epistasis-database.json epistasis-database.html 1,606 Burch et al. 2024, Evolution
epistasis-database-citations.csv 128 primary sources for the 1,606 crosses
karyotypes-coleoptera.csv karyotypes/index.html 4,959 Blackmon & Demuth 2015
karyotypes-diptera.csv karyotypes/index.html 3,474 Morelli, Blackmon & Hjelmen
karyotypes-amphibia.csv karyotypes/index.html 2,124 Perkins et al. 2019
karyotypes-mammalia.csv karyotypes/index.html 1,440 Blackmon Lab, curated
karyotypes-drosophila.csv karyotypes/index.html 1,247 Morelli, Blackmon & Hjelmen
karyotypes-polyneoptera.csv karyotypes/index.html 823 Sylvester et al. 2020
karyotypes-coleoptera-citations.csv 251 primary sources for beetle records
karyotypes-six-index.json manifest for the six clade CSVs

Manifest and licensing

File Purpose
index.json Full machine-readable manifest of every file in this directory
LICENSE CC BY 4.0 license text and attribution guidance

Refreshing the publications snapshot

The publications page tries data/publications.json first and falls back to a live ORCID API call. The snapshot keeps the page fast and resilient when ORCID is slow or down. To regenerate:

cd ~/Desktop/GitHub/coleoguy.github.io
python3 << 'EOF'
import json, urllib.request, time
ORCID = "0000-0002-5433-4036"
BASE = f"https://pub.orcid.org/v3.0/{ORCID}"
def fetch(url):
    req = urllib.request.Request(url, headers={'Accept': 'application/json'})
    with urllib.request.urlopen(req, timeout=30) as r: return json.loads(r.read())
data = fetch(f"{BASE}/works")
ALLOWED = {'journal-article', 'book', 'book-chapter', 'preprint'}
works, put_codes = [], []
for g in data.get('group', []):
    s = g.get('work-summary', [{}])[0] or {}
    pc = s.get('put-code')
    title = (s.get('title') or {}).get('title', {}).get('value', '')
    journal = (s.get('journal-title') or {}).get('value', '')
    year = (s.get('publication-date') or {}).get('year', {}).get('value', '')
    typ = s.get('type', '')
    doi = ''; doi_url = ''
    for eid in (s.get('external-ids') or {}).get('external-id', []):
        if eid.get('external-id-type') == 'doi':
            doi = eid.get('external-id-value', '')
            doi_url = (eid.get('external-id-url') or {}).get('value') or f'https://doi.org/{doi}'
            break
    if not title or typ not in ALLOWED: continue
    works.append({'putCode': pc, 'title': title, 'journal': journal, 'year': year,
                  'doi': doi, 'doiUrl': doi_url, 'type': typ, 'authors': ''})
    put_codes.append(pc)
authors = {}
for i in range(0, len(put_codes), 25):
    batch = put_codes[i:i+25]
    b = fetch(f"{BASE}/works/{','.join(str(p) for p in batch)}")
    for entry in b.get('bulk', []):
        w = entry.get('work', {})
        contribs = (w.get('contributors') or {}).get('contributor', [])
        names = [(c.get('credit-name') or {}).get('value') for c in contribs
                 if (c.get('contributor-attributes') or {}).get('contributor-role') == 'author']
        authors[w.get('put-code')] = ', '.join(n for n in names if n)
    time.sleep(0.2)
for w in works:
    if w['putCode'] in authors: w['authors'] = authors[w['putCode']]
with open('data/publications.json', 'w') as f:
    json.dump({'orcid': ORCID, 'fetched_at': time.strftime('%Y-%m-%dT%H:%M:%SZ', time.gmtime()), 'works': works}, f, indent=2, ensure_ascii=False)
print(f"Wrote {len(works)} works to data/publications.json")
EOF

cures-karyotype-database.csv

Chromosome number records collected through Course-based Undergraduate Research Experiences (CUREs) at Texas A&M University. Accompanies the interactive browser at coleoguy.github.io/cures-karyotype-database.html.

Citations are clade-level: every species within a clade shares the citation listed in the “Karyotype Source” column of the project tracking sheet. Phylogeny sources and per-clade student leads are available from the lab on request.

Citation

If you use this dataset, please cite:

Copeland, M., McConnell, M., Barboza, A., Abraham, H.M., Alfieri, J., Arackal, S., Bernard, C.E., Bryant, K., Cast, S., Chien, S., Clark, E., Cruz, C.E., Diaz, A.Y., Deiterman, O., Girish, R., Harper, K., Hjelmen, C.E., Thompson, M.J., Koehl, R., Koneru, T., Laird, K., Lee, Y., Lopez, V.R., Murphy, M., Perez, N., Schmalz, S., Sylvester, T., and Blackmon, H. (2026). Dismantling Chromosomal Stasis Across the Eukaryotic Tree of Life. bioRxiv 2026.04.14.718287. https://doi.org/10.64898/2026.04.14.718287

Please also cite the original karyotype sources listed in the citation column.

@article{Copeland2026.04.14.718287,
  author    = {Copeland, Megan and McConnell, Meghann and Barboza, Andres and Abraham, Hannah M and Alfieri, James and Arackal, Steven and Bernard, Carrie E and Bryant, Kiedon and Cast, Shelbie and Chien, Sean and Clark, Emily and Cruz, Cassandra E and Diaz, Aileen Y and Deiterman, Olivia and Girish, Riya and Harper, Kaya and Hjelmen, Carl E and Thompson, Michelle J and Koehl, Rachel and Koneru, Tanvi and Laird, Kenzie and Lee, Yoonseo and Lopez, Virginia R and Murphy, Mallory and Perez, Nayeli and Schmalz, Sarah and Sylvester, Terrence and Blackmon, Heath},
  title     = {Dismantling Chromosomal Stasis Across the Eukaryotic Tree of Life},
  year      = {2026},
  doi       = {10.64898/2026.04.14.718287},
  publisher = {Cold Spring Harbor Laboratory},
  journal   = {bioRxiv},
  elocation-id = {2026.04.14.718287},
  url       = {https://www.biorxiv.org/content/early/2026/04/16/2026.04.14.718287}
}

epistasis-database.csv

Line-cross datasets from plants and animals analyzed in Burch et al. 2024 with the SAGA2 information-theoretic framework. One row per dataset; see epistasis-database.json for the full schema and the 128 source citations in epistasis-database-citations.csv.

If you use this dataset, please cite:

Burch, B.D., Alexander, E.P., Fu, Y., and Blackmon, H. (2024). Information theoretic line-cross analysis and the evidence for pervasive epistasis. Evolution 78(4): 624-634. https://doi.org/10.1093/evolut/qpae003

Six karyotype databases

The files karyotypes-coleoptera.csv, karyotypes-diptera.csv, karyotypes-amphibia.csv, karyotypes-mammalia.csv, karyotypes-drosophila.csv, and karyotypes-polyneoptera.csv are the raw CSVs behind the clade cards on coleoguy.github.io/karyotypes/. Each CSV has its own column set because the databases were built at different times for different projects; karyotypes-six-index.json lists them in one place with columns and primary publication for each. Coleoptera ships with a separate karyotypes-coleoptera-citations.csv file of primary sources.

voice.md

A portable style calibration of Heath Blackmon’s scientific-writing voice, distilled from seven lead-authored papers (four Genetics / Evolution / J Heredity articles, a sole-authored newsletter piece, a first-author review, and one mentee-led paper as a lab-norm signal). Intended to be loaded as context by any agent drafting prose that Heath will sign: manuscripts, cover letters, grant sections, lab-site pages.

The file groups rules by scale (sentence, paragraph, section, word) and anchors every rule to verbatim passages from the source papers so that an agent pattern-matches on real examples rather than on an abstract summary. It also names explicit anti-patterns — rhetorical moves that are absent from the corpus and that a generic “good scientific writing” guide would otherwise incorrectly suggest.

Not appropriate for: prose another author is signing, editing someone else’s work, or lab-site prose that needs the compressed declarative mode of the hand-written HTML pages.

Regenerate when the lead-authored corpus expands or when recent writing diverges from the rules in the file.

License

All files in this directory are released under CC BY 4.0. See LICENSE for the full terms and a short note on how to attribute.

Question copied. Paste it into the NotebookLM tab.