# Blackmon Lab, Full content snapshot for LLMs > Plain-text snapshot of the public pages of https://coleoguy.github.io > Generated for LLM/agent consumption. For machine-readable per-page metadata, see llms.txt. > Updated: 2026-04-16 This file consolidates the visible text content of every major page on the site so language model agents can cite, summarize, or answer questions without rendering JS. Auto-generated outlines and body text only, interactive widgets, plots, and tables are not represented here. For databases (karyotype, tau, epistasis), see their underlying JSON/CSV files in /data/ and /subpages/karyotype-data/. ## Blackmon Lab - Evolutionary Biology at Texas A&M URL: https://coleoguy.github.io/index.html Description: The Blackmon Lab studies genome structure evolution across the tree of life using computational, comparative, and experimental approaches at Texas A&M University. Beetle artwork by Meg McConnell "Nothing in biology makes sense except in the light of evolution" - Dobzhansky 1973 ### Study Systems **Beetles** Karyotype evolution and sex chromosome systems across Coleoptera - the most species-rich order of animals. **Tomatoes** Chromosome number variation and its link to adaptation in Solanaceae, including wild and domesticated species. **Betta Fish** Genetics of aggression, color, and sexual selection - and how domestication shapes phenotypic evolution. **Chickens** Sex-linked traits and sexual antagonism in a model system for understanding domestication. **Crabs** The evolution of freshwater invasion and its genomic consequences across Brachyura. **Mammals** Broad-scale comparative analyses of sex chromosome evolution and karyotype diversity. ### Lab Highlights ### Connect ### Dr. Blackmon's Schedule ## Research - Blackmon Lab URL: https://coleoguy.github.io/research.html Description: The Blackmon Lab studies genome evolution across the tree of life, investigating sex chromosomes, chromosome number, and the genetic basis of adaptation using theoretical, comparative, and experimental approaches. # Research The Blackmon Lab is in the Department of Biology at Texas A&M University. We study how and why the structure of genomes evolves. Our study organisms range from tomatoes and betta fish to beetles, chickens, and mammals. We use three general approaches in our research: theoretical population genetics, comparative methods, and genetics/genomics. By integrating these approaches, we aim to understand the forces that shape genome architecture across the tree of life. Whether we are modeling the spread of chromosomal rearrangements, analyzing patterns of karyotype evolution across thousands of species , or mapping the genetic basis of traits in the lab, our goal is to uncover the fundamental rules governing genome evolution. **Current Research Questions** - 1 What evolutionary forces lead to the divergence of sex chromosomes, and what forces act on "old" highly diverged sex chromosomes? [visual guide] - 2 Why does chromosome number evolve rapidly in some clades but remains nearly static in others? [intellectual history] [visual guide] - 3 Is there an ideal chromosome number, and if so, what determines that value? - 4 What determines the fate of mutations that expand the proportion of the genome linked to a sex-determining locus? - 5 Are there inherent fitness trade-offs between male and female phenotypes, or can a single genome be fit regardless of sex? - 6 Does the importance of epistasis vary across plants and animals? [epistasis database] [visual guide] - 7 How does domestication impact organisms? What can we learn about adaptation and radiation from studying domestication? Generated using Google NotebookLM from our published research papers. ### From the Field ## Publications - Blackmon Lab URL: https://coleoguy.github.io/publications.html Description: Peer-reviewed publications from the Blackmon Lab at Texas A&M University, studying genome structure, sex chromosomes, and evolutionary biology. # Publications ## Team - Blackmon Lab URL: https://coleoguy.github.io/team.html Description: Meet the members of the Blackmon Lab at Texas A&M University, including graduate students, research staff, undergraduates, and alumni studying genome evolution. # Team **Principal Investigator** Heath earned his Ph.D. in 2015 from the Demuth lab at UT Arlington, where he studied the evolution of sex chromosomes and karyotypes in Coleoptera. He then completed a postdoc at the University of Minnesota with Emma Goldberg and Yaniv Brandvein. He opened the Blackmon Lab at Texas A&M in the fall of 2017. Curriculum Vitae **Research Staff** Founding member of the lab. LT assists with fieldwork and is currently working on morphometrics of Chrysina species. Research tech responsible for model organism care. Kenzie is also working on discrete trait PCMs and Betta Fish aggression. She is also a champion archer. **Graduate Students** Joined 2022 · Biology Ph.D. · Sean is broadly interested in evolutionary biology and Coleoptera. When he is not in the lab he loves fishing. Website Joined 2022 · Biology Ph.D. · Megan is interested in genome structure and has a background in genomics, bioinformatics, and forensics. Website Joined 2023 · Genetics Ph.D. · Andres specializes in theoretical evolution and population genetics, with a focus on bioinformatics and genomics. His current research is focused on assembling scarab beetle genomes and developing computational tools to support the conservation of Chondrichthyes (sharks and rays). Website Joined 2023 · Biology Ph.D. · Kaya's research explores how environmental variation shapes genomic architecture, epigenetic regulation, and phenotypic plasticity across natural populations. She is particularly interested in how evolutionary processes interact with human-driven environmental change to influence adaptation, resilience, and vulnerability under shifting climatic conditions. Outside the lab, she enjoys running, hiking, scuba diving, and exploring the kinds of dynamic landscapes that continually inspire her work. Website Joined 2025 · EEB Ph.D. · Shelbie is interested in the evolution and ecology of crustaceans, studying crab freshwater invasion. Google Scholar Joined 2025 · Biology Ph.D. · Kiedon is interested in using both empirical and theoretical approaches to study the behavioral ecology of fishes and the evolution of mating systems. Google Scholar **Post-Bacc Students** Joined 2025 · Meghann is broadly interested in comparative phylogenetic approaches focused on questions about chromosome number transitions to alternative meiosis mechanisms. She is also interested in creating novel model approaches for understanding trait evolution. **Undergraduate Researchers** Bella is working on a project that focuses on computational and evolutionary work. Sarah Schmalz is a senior Biology major and University Honors student interested in the intersection of evolutionary genomics and bioinformatics. Her current research focuses on extending mathematical models for sex-autosome fusions by incorporating fixation weights to better predict genomic architecture. - Emily Clark - Olivia Deiterman - Riya Girish - Anna Klein - Rachel Koehl - Mallory Murphy - Alex Rathsack **Alumni** - Kevin Bolwerk - Sally Bounds - Jimena Garcia - Wyatt Stogsdill - Runyan Zhou - Zhaobo Hu - PhD Student - Jorja Burch - Wiley Publishers - Emma Lehmberg - Carl Hjelmen - Asst. Professor, Utah Valley University - Michelle Jonika - Priscilla Glenn - Postdoc, Texas A&M - Jamie Alfieri - Postdoc, UT Austin - Terrence Sylvester - Postdoc, UT Memphis - Sarah Ruckman - Ph.D. student, FSU - Nathan Anderson - Ph.D. student, UW Madison - Johnathan Lo - Ph.D. student, Berkeley - Zachary Hoover - Ph.D. student, Texas A&M - Annabel Perry - Ph.D. student, Harvard - Max Chin - Ph.D. student, UC Davis - Kayla Wilhoit - Ph.D. student, Duke ## Resources - Blackmon Lab URL: https://coleoguy.github.io/resources.html Description: Software, R packages, databases, teaching materials, and tools from the Blackmon Lab at Texas A&M University for evolutionary biology research. # Resources Software, datasets, teaching materials, and tools produced by our lab. #### Software & R Packages - chromePlus Markov models for chromosome + binary trait evolution, guide - evobiR Analysis, simulation & teaching functions, vignette - SAGA2 Model-averaged line cross analysis (AICc) - DirectRepeateR Direct repeat annotation in genome assemblies - micRocounter Fast 2–6mer repeat identification - All GitHub repos → #### Databases - Epistasis Database 1,600+ datasets from 130+ papers on the role of epistasis in trait divergence - Karyotype Databases 14,000+ cytogenetic records, beetles, flies, amphibians, mammals, Polyneoptera - Circadian Period (τ) Database 1,634 free-running period measurements across the tree of life - Tree of Sex Database 30,000+ records on sex determination across the tree of life - Public Data Sources AVONET, PanTHERIA, GBIF, Open Tree of Life, Dryad & more #### Visual Guides - How Sex Chromosomes Evolve From autosomes to X, Y, Z, W, diagrams, interactive charts, and key papers - Evolution of Genome Structure Fusions, fissions, polyploidy, meiotic drive, and karyotype stasis - Epistasis & the Shifting Balance Line cross analysis, SAGA, and the Wright–Fisher debate - Selection in Evolution Natural, sexual, background, and indirect selection, drift vs. selection - Coleoptera Genomics Beetle genomes, Stevens elements, dosage compensation, and population genetics - History of Chromosome Evolution Stebbins to ChromEvol, a century of ideas - Publication Network Interactive co-authorship graph #### Interactive Tools - Population Genetics Simulator Wright-Fisher: drift, selection, mutation, migration, sex chromosomes - Birth-Death Tree Simulator Same rates, different outcomes, stochasticity of macroevolution - Phylogenetic Tree Explorer Load Newick, explore traits, multiple layouts - CaveCrawler Interactive cave fish ( Astyanax ) bioinformatics - Research Glossary PCM, chromosome evolution, and pop-gen terms defined #### AI & Automation - TraitTrawler AI-powered trait data collection from the literature - AI Tools for Biologists Curated tools for data-driven biology - AI Prompting Guide Paper review, qual exam simulation, and more - All AI Projects → #### Courses & Teaching - Phylogenetic Comparative Methods Interactive guide: phylogenies → continuous → discrete → combined - Phylogenetics 101 3-hour intro for upper undergrads / early grad students - R for Biologists Tutorials for common ecology & evolution analyses; no prior R needed - Experimental Design Statistically tractable experiment design using R - BIOL 682: Communication in Bio Sci Scientific writing, storytelling, peer review & revision - CURE Data Guide Undergrad research: chromosome number evolution - Foundations of Evo Bio Seminar, foundational papers of the modern synthesis - Fall 2025 Journal Club JEB 2024 reading list #### For Graduate Students - Grad School Orientation Mentoring, identity, and securing jobs - Graduate Funding Opportunities Fellowships, grants, and funding sources - Foundational Resources Data, writing, and career planning tools - Raspberry Pi Cluster Build 26-node Pi 4B cluster notes - TA Assignment App Tutorial · Google Form ## AI-Native Biology - Blackmon Lab URL: https://coleoguy.github.io/ai.html Description: The Blackmon Lab at Texas A&M uses AI to accelerate research in evolutionary biology. Autonomous literature agents, AI-assisted theory, and practical curricula for biologists. # AI in Evolutionary Biology We use AI tools to accelerate research in evolutionary biology. Our work spans autonomous literature-mining agents, AI-assisted theoretical derivations, and practical curricula for biology students and researchers. ### How we use AI in the lab We use AI as a research tool, not just a productivity shortcut. That means autonomous pipelines with validated outputs, workflows that scale to problems a single lab couldn't otherwise tackle, and curricula that give students practical skills alongside the critical thinking to evaluate AI outputs. **Autonomous agents** Software agents that search databases, triage papers, extract structured data, and generate new queries based on what they find. They run unattended and pick up where they left off across sessions. **AI as lead investigator** We are testing whether a Claude-powered agent can act as principal investigator on a complete research project: forming hypotheses, selecting comparative methods, running analyses, and drafting a manuscript. The goal is to understand where AI reasoning works in biology and where it fails. **AI-literate biologists** Biology students need practical fluency with AI tools alongside the critical thinking to evaluate their outputs. We have built curricula at the undergraduate and graduate levels at Texas A&M to develop both. **Data at scale** Cytogenetics data from the past 80 years are scattered across hundreds of journals in multiple languages. AI lets us extract, standardize, and analyze this literature without the usual limits on person-hours. **Validated outputs** Every AI extraction gets a confidence score. Flagged records are routed to domain experts. Validation is built into the pipeline design, not added as an afterthought. **Open and reproducible** All databases, code, and workflows are published openly so other comparative biology labs can replicate and build on them. ### Research projects Active and recent projects using AI tools in the lab. TraitTrawler A general-purpose autonomous literature-mining pipeline that searches PubMed, OpenAlex, bioRxiv, and Crossref, retrieves full-text PDFs through a 12-source cascade, and extracts structured trait data with mandatory double-entry verification before writing validated records to CSV. Generalizes the lab's earlier domain-specific karyotype agent to any trait, any clade. https://github.com/coleoguy/TraitTrawler AI as Lead Investigator We gave a Claude-powered agent a complete research problem in claw evolution and asked it to carry the project from hypothesis to manuscript. The goal is to understand where AI reasoning works in biology and where it breaks down. Manuscript in preparation. RateScape Penalized likelihood method for estimating branch-specific rate scalars on phylogenies. We used AI tools throughout method development, simulation, and manuscript writing. Planned application to pathogen drug-resistance rate estimation (NHGRI PAR-25-228). Karyotype Databases Six interactive karyotype databases covering Coleoptera, Diptera, Amphibia, Mammalia, Drosophila, and Polyneoptera. Over 20,000 records total. All static, downloadable, and machine-readable. Population Genetics Simulator Interactive Wright-Fisher simulator supporting drift, selection, mutation, migration, and bottlenecks. Built entirely through AI-assisted coding as a teaching tool and a test of what a coding agent can produce without manual intervention. Sex-Autosome Fusion Theory Theory paper on sex-autosome fusion fixation probability. We used AI tools for symbolic derivations, analytical cross-checks, and drafting. A useful test case for how AI assistance affects throughput and error rate in theoretical work. ### Teaching Courses and curricula at Texas A&M giving biology students hands-on experience with AI tools. Biology and AI CURE Undergraduate research course where each student conducts an original evolutionary biology project using phylogenetic comparative methods and AI-assisted analysis. Spring 2026 cohort. AI in Biology Concentration Formal 10-credit-hour concentration in the Texas A&M Biology BS and PhD programs. Built to develop skills that transfer across tools, not just familiarity with whatever is current. AI Tools & Prompting Guides Practical guides to AI tools and prompting techniques written for biologists. Covers literature review, data analysis, coding assistance, and writing workflows. ### Principles Some constraints we try to stick to. Every AI output is verified before it enters a publication, dataset, or decision. The form of verification, computational check, statistical test, expert review, depends on the task, but the expectation is universal. Search, extraction, reformatting, summarizing, and consistency checking are good uses of AI. Deciding what question is scientifically worth asking, whether a result makes biological sense, and what to conclude, those stay with the researcher. Reproducible science requires knowing not just what AI generated, but what it was asked, with what model, and under what constraints. Prompts, model versions, and pipeline configurations are part of the methods section. We care as much about documenting where AI reasoning breaks down as demonstrating what it can do. Systematically testing the limits of a tool is as scientifically valuable as deploying it successfully. A general-purpose AI applied to a specialized scientific problem produces generic results. Effective use requires encoding expert knowledge in prompts, constraints, and validation rules, not assuming the model already carries it. ### Join the lab We are recruiting graduate students and postdocs interested in using AI tools in evolutionary biology research. Get in touch if that sounds like you. ## Join Us - Blackmon Lab URL: https://coleoguy.github.io/join.html Description: Join the Blackmon Lab at Texas A&M University. We seek students with diverse perspectives and enthusiasm for evolutionary biology and computational approaches. # Joining our Team We believe in creating a supportive environment for all lab members to reach their full potential. We are always seeking new students that can bring diverse perspectives, enthusiasm, and dedication into the lab. We believe strongly in the importance of creating a supportive environment for all lab members to reach their full potential. To achieve this we have explicit expectations for all lab members including Dr. Blackmon. - Attend lab meetings - Take ownership and responsibility for assigned projects and tasks - Take care of our space (clean up after yourself) - Arrive at scheduled meetings on time and prepared - Participate in lab functions - Respond to all emails within 24 hours - Attend weekly seminars and job seminars - Attend journal club - Apply for grants/fellowships - Present research at conferences - Mentor one or more undergraduate - Publish 3 papers during PhD - Read all standard lab readings prior to end of second year - Devote a minimum of 10 hours a week to lab work - Contribute to discussions in lab meeting - Mentor graduate students - Apply for fellowships/jobs - Meet with students at scheduled times - Give feedback on papers/applications within 2 weeks max - Secure funding for lab Lab Meetings are mandatory for all lab members and occur on Fridays at 11:00 AM in BSBW 425. If we are discussing a paper, all students are expected to come with either questions or comments. ## News - Blackmon Lab URL: https://coleoguy.github.io/news.html Description: Latest news, highlights, and updates from the Blackmon Lab at Texas A&M University. # Lab News Recent highlights, publications, awards, and updates from the Blackmon Lab. ## Lab Life - Blackmon Lab URL: https://coleoguy.github.io/gallery.html Description: Photos and videos from the Blackmon Lab at Texas A&M University. Fieldwork, lab parties, conferences, and the fun side of evolutionary biology. # Lab Life Fieldwork, conferences, lab parties, and the fun side of science. ## CURE in Evolutionary Biology and AI: Student Showcase URL: https://coleoguy.github.io/biolai-cure.html Description: Meet the students of the BiolAI CURE program and their evolutionary biology research projects at the Blackmon Lab, Texas A&M University. # Biology and AI CURE Showcase Meet the Spring 2026 cohort and their independent research projects in evolutionary biology. Every student below is conducting original research using phylogenetic comparative methods and AI-assisted analysis. ### What is a CURE? A Course-based Undergraduate Research Experience (CURE) replaces traditional cookbook labs with authentic, discovery-driven research. Unlike standard coursework, every student in a CURE tackles a genuine scientific question whose answer is unknown-meaning their work can contribute to real publications and scientific progress. In Dr. Blackmon's Biology and AI CURE, students combine evolutionary biology with AI-powered tools to formulate hypotheses, build phylogenetic datasets, run comparative analyses, and interpret results at a level typically reserved for graduate research. These are not c... #### Anabiya Ali Anabiya is exploring whether extreme heat environments push fish toward genetic sex determination by mapping sex-determining mechanisms against temperature extremes across freshwater minnows. She is using data on sex determination mode and 95th-percentile maximum temperatures in Cyprinidae. #### Khadija Ansari Khadija is investigating the paradox of chromosomal stability in whales and dolphins-most share 2n=44, yet harbor remarkable micro-scale genomic variation. She is exploring whether habitat fragmentation drives cryptic chromosomal restructuring in Cetaceans. #### Steven Arackal Steven is asking whether investment in touch and vision evolve in lockstep across rodents by measuring skull morphology traits linked to sensory structures. He is collecting tactile and visual trait measurements from Cricetidae skulls. #### Dharshini Baskaran Dharshini is investigating whether specialist fruit flies have streamlined their taste receptor toolkit compared to generalists. She is comparing gustatory receptor gene repertoire sizes against host plant breadth across Drosophilidae. #### Carrie Bernard Carrie is in her second semester working in Dr. Blackmon's CUREs course. She is testing whether habitat type-lentic ponds versus lotic streams-drives the evolution of sexual size dimorphism in dragonflies and damselflies. She is collecting size dimorphism and habitat data across Odonata. #### Alexa Burgos Alexa is investigating whether flash-displaying hidden wing colors in stick insects represents an evolutionary stepping stone from camouflage to full warning coloration. She is mapping deimatic and aposematic traits across Phasmatodea. #### Ruben Carreno Ruben is uncovering the evolutionary rules that govern diet transitions-from sap-feeding to seed-eating to predation-across one of the most ecologically diverse insect orders. He is tracing dietary shifts across a phylogeny of Hemiptera. #### Emily Chew Emily is exploring whether island butterfly and moth populations show accelerated chromosome evolution compared to their mainland relatives. She is comparing chromosomal change rates across island versus mainland Lepidoptera. #### Isabella Collins Isabella is testing whether how a plant disperses its seeds-by wind, water, or animal-shapes the pace of its chromosome evolution. She is collecting dispersal mode, dispersal distance, and genome size data across Brassicaceae. #### Camille Cordell Camille is asking whether hawks and eagles with larger geographic ranges experience different rates of chromosome number evolution than range-restricted species. She is mapping haploid chromosome numbers against range size in Accipitriformes. #### Ishan Dash Ishan is testing the drift barrier hypothesis by examining whether island-endemic skinks, with their smaller population sizes, show faster chromosomal evolution than mainland relatives. He is compiling chromosome counts and island-versus-mainland status across Scincoidea. #### Sophia Dong Sophia is testing whether mating system and sexual dimorphism evolve in concert-as sexual selection theory predicts-across game birds. She is measuring six skeletal dimorphism traits and scoring mating systems in Galliformes. #### Robin Flanagan Robin is investigating whether colonizing islands triggers shifts in spider silk properties, linking biogeography to biomaterial evolution. They are collecting tensile strength measurements and island-versus-mainland status across Araneae. #### Maya Friedman Maya is testing whether habitat-marine, freshwater, or terrestrial-shapes the pace of chromosome number evolution in turtles. She is analyzing haploid numbers for 141 Testudines species alongside their habitat classifications. #### Ananya Gudapati Ananya is exploring whether genome size and wood density evolve together in conifers and their relatives, linking cellular-level genomics to whole-organism traits. He is collecting genome size and wood density measurements across Gymnosperms. #### Bevin Haynes Bevin is asking whether having a massive genome speeds up or slows down chromosomal rearrangement and lineage diversification in grasshoppers and crickets. She is compiling genome sizes and chromosome data across Orthoptera. #### Anaya Hooda Anaya is testing whether the reduced population sizes of island snakes accelerate their chromosome number evolution compared to continental species. She is collecting chromosome counts and geographic data across Serpentes. #### Andrew Jordan Andrew is testing a new hypothesis: that the tail membrane (uropatagium) of bats co-evolves with echolocation, pushing past the field's focus on wing shape alone. He is measuring uropatagium dimensions and call parameters across Chiroptera. #### Lynn Khalidi Lynn is investigating whether arboreal versus terrestrial lifestyles shape body size and litter size evolution in marsupials. She is collecting life history and ecological data across Marsupialia. #### Ruth Koffi Ruth is testing the drift barrier hypothesis in beetles by asking whether flightless scarab species-with their smaller, more isolated populations-show faster chromosome evolution. She is mapping winglessness and chromosome data across Scarabaeidae. #### Amrutha Kosanaum Amrutha is asking whether the way a frog develops-hatching as a tadpole versus emerging fully formed-predicts the rate of chromosome evolution. She is using AmphiBIO development mode data and chromosome counts across Anura. #### Sienna Kramer Sienna is investigating whether sperm competition reshapes both testes investment and sperm energetic design in primates, connecting mating behavior to cellular-level evolution. She is collecting testes size and sperm midpiece length data across Primates. #### Joy Lee Joy is testing whether specialist parasitoid wasps, locked into narrow host relationships, experience faster chromosomal evolution than their generalist relatives. She is collecting host specificity and haploid chromosome numbers across Hymenoptera. #### Virginia R. Lopez Virginia is asking whether genome size constrains how many ecoregions a plant species can occupy, linking genome architecture to ecological range. She is analyzing genome size and ecoregion counts across the sunflower family Asteraceae. #### Srija Manapuri Srija is asking whether genome size predicts rates of chromosome evolution across beetles-one of the most species-rich animal orders on Earth. She is compiling genome and chromosome data across Coleoptera. #### Ryan Matta Ryan is testing whether weevil rostrum shape tracks the plant structures they use for egg-laying, supporting an adaptive multi-optimum model of evolution. He is measuring rostrum morphology and oviposition guild across Curculionidae. #### Janie Menjivar Janie is exploring the convergent evolution of pelvic suction disks in goby fishes, asking whether substrate type drives repeated adaptation of this attachment structure. She is measuring relative pelvic disk area and habitat type across Gobiidae. #### Robert Millikan Robert is asking whether jaw suspension type acts as an evolutionary constraint on morphological diversification in sharks and rays. He is collecting jaw morphology trait data across Chondrichthyes. #### Olivia Montorello Olivia is investigating whether genome size predicts the rate of chromosome number evolution in fungi, connecting genome architecture to karyotype dynamics. She is compiling genome and chromosome data across Fungi. #### Jackson Moore Jackson is testing whether habitat drives wing size evolution through selection versus drift in flightless ratites and their flying relatives, the tinamous. He is testing his hypothesis on recently extinct taxa (moa) alongside the extant diversity of Palaeognathae, collecting body mass, wing length, and habitat data. #### Samy Muktevi Samy is asking whether polyploidy-whole-genome duplication-gives orchids a colonization advantage by expanding their geographic ranges. She is mapping ploidy level against range size across Orchidaceae. #### Meghana Munduru Meghana is testing whether reproductive mode shapes genome size evolution in tetras and characins, linking life history strategy to molecular architecture. She is collecting reproductive and genomic data across Characidae. #### Soniya Muñoz Soniya is asking a fascinating evolutionary sequence question: did catfish become nocturnal before or after evolving body armor? She is scoring armor presence and activity pattern across Siluriformes. #### Anna Brooke Naegle Anna is exploring whether the boom-and-bust lifecycle of annual killifish-driven by seasonal rainfall-accelerates their chromosome evolution. She is linking precipitation seasonality to chromosome data in Nothobranchiidae. #### Tewobola Olasehinde Tewobola is building a computational approach asking whether domestication shifts the shape of the protein-coding landscape in carnivore genomes. They are comparing proteome embedding patterns between domestic and wild Carnivora. #### Bhakti Patel Bhakti is tracing the evolutionary history of caffeine biosynthesis to determine whether this iconic chemical defense evolved once or arose independently multiple times in the coffee tribe. She is scoring caffeine production across approximately 50 species of Coffeeae. #### Samshritha Pochanapeddi Samshritha is investigating whether genome size influences the rate of chromosome evolution across lilies, connecting molecular-level variation to karyotype dynamics. She is using chromosome counts and genome size data from Kew and NCBI across Liliaceae. #### Anjalika Sachan Anjalika is testing whether genome size predicts rates of chromosome evolution in ferns-a group famous for their enormous genomes. She is using the Kew Plant DNA C-values Database and chromosome counts across Pteridophyta. #### Evelyn Sanchez Evelyn is investigating whether reliance on the labyrinth organ-an air-breathing adaptation-correlates with body size and chromosome number evolution in gouramis and their relatives. She is collecting trait data across Anabantiformes. #### Kiara Santiago Kiara is studying chromosome evolution in geckos, one of the most species-rich lizard families. She is collecting karyotype data across Gekkonidae. #### Hamzah Sheikh Hamzah is testing whether island iguanas and their relatives show faster chromosome evolution than mainland populations, applying the drift barrier framework to reptiles. He is compiling chromosome data across Iguania. #### Dana Stavinoha Dana is asking whether the presence of B chromosomes-those mysterious extra genomic passengers-correlates with genome size evolution in nightshades. She is mapping B chromosome presence and genome size across Solanaceae. #### Ava Tingue Ava is testing whether genome size predicts the rate of chromosome number evolution in leaf beetles, one of the most diverse herbivore radiations. She is compiling genome and chromosome data across Chrysomelidae. #### Keagan Tran Keagan is asking whether the evolution of eusociality breaks the classic flight-fecundity trade-off in cockroaches and termites. They are collecting flight capability and fecundity data across Blattodea. #### Felix Vasili Felix is investigating whether a burrowing lifestyle drives the convergent evolution of pectine tooth counts across scorpion lineages. He is compiling pectine morphology and ecology data across Scorpiones. #### Thomas Vela Thomas is testing the drift barrier hypothesis by asking whether rodents with small home ranges and low population density show faster chromosome evolution. He is using home range, population density, chromosome count, and genome size data across Muridae. #### MaryJo Velasquez Mary-Jo is testing whether paedomorphosis-retaining juvenile traits into adulthood-correlates with accelerated chromosome evolution in salamanders. She is mapping developmental strategy against karyotype change rates across Caudata. #### Samhita Vemuri Samhita is exploring whether climate-cold and seasonal versus warm and tropical-predicts genome size across passionflowers and their relatives. She is collecting climate and genomic data across Passifloraceae. #### Emma Walker Emma is testing whether migratory songbirds show different chromosome evolution rates than sedentary species, linking behavioral ecology to genome architecture. She is collecting migratory status and chromosome numbers across Passeriformes. #### MacKenzie Wilkerson MacKenzie is quantifying elytral melanism in ladybugs to test whether coloration patterns reflect a thermoregulation–habitat trade-off. She is measuring percent black dorsal surface across species in her Coccinellidae phylogeny. #### Aiden Nychka Aiden-project details coming soon. #### Heath Blackmon Heath Blackmon is an associate professor and evolutionary biologist leading the Biology and AI CURE. He is testing the Drift Barrier Hypothesis by determining whether large genomes lead to higher rates of chromosome evolution. He is using a dataset of haploid chromosome numbers and sex chromosome systems spanning all of Metazoa. ## Concentration: AI in Biology | Texas A&M University Biology URL: https://coleoguy.github.io/https://artsci.tamu.edu/biology/academics/undergraduate/concentration-ai-in-biology.html Description: AI in Biology concentration at Texas A&M University. 10 semester hours of AI-focused coursework in Biology BS and PhD programs. # Concentration: AI in Biology Build the skills that define the next generation of biologists. AI is changing what biologists can do: how fast they work, how much data they can handle, and the kinds of questions they can ask. The AI in Biology concentration gives you hands-on experience with the tools at the center of that shift. You'll learn to use AI for real biological problems: analyzing images, working with large datasets, generating and troubleshooting code, synthesizing literature, and designing reproducible research workflows. This concentration is about using AI, not building it . No prior programming or AI experience is required. Whether you're interested in virology, molecular biology, microbiology, evolution, or anything in between, these skills will strengthen your research and set you apart in graduate school, industry, and beyond. ### BS Courses Complete a minimum of 10 of the 13 available semester hours. Most students can satisfy the concentration without adding to their overall course load. ¹ BIOL 481 may be repeated for credit, but only 1 SH may be applied toward the concentration. Note: Advanced undergraduates may enroll in BIOL 683 with department permission. ### PhD Courses Complete a minimum of 10 of the 13 available semester hours. These courses are designed to integrate directly with your dissertation research. ¹ BIOL 681 may be repeated for credit, but only 1 SH may be applied toward the concentration. Note: Both BIOL 689 sections (AI in Biology and AI Productivity for Researchers) are distinct Special Topics offerings and may both be applied toward the concentration. ### Why Add This Concentration? The tools available to biologists have changed dramatically. AlphaFold reshaped how we think about protein structure. AI-powered image analysis makes studies possible that were unimaginable a decade ago. Large language models are becoming standard tools for reading, writing, and thinking through scientific problems. Researchers who can use these tools effectively move faster, ask bigger questions, and produce stronger work. Every course in this concentration is built around one idea: you don't need to build AI to benefit from it . You'll develop practical, transferable skills - how to choose the right tool for a biological problem, how to use it effectively, and how to critically evaluate what it gives you back. That last part matters. Knowing when an AI is wrong is just as important as knowing how to use it. ### Who Can Join? - Undergraduates: Open to all BS students majoring in Biology, Microbiology, Molecular & Cell Biology, or Zoology. Start with BIOL 289 - no prior coding or AI experience needed. - Graduate students: Open to all Biology and Microbiology PhD students. The concentration is designed to complement your dissertation research, not add to your workload. - Not sure where to start? Reach out to the undergraduate or graduate advising office - they can help you map the concentration onto your existing degree plan. ## How Sex Chromosomes Evolve - Blackmon Lab URL: https://coleoguy.github.io/sex-chromosome-evolution.html Description: A visual guide to sex chromosome evolution, from ordinary autosomes to the X, Y, Z, and W chromosomes that determine sex across the tree of life. Covers identification methods, dosage compensation, degeneration, and more. # How Sex Chromosomes Evolve ### What Are Sex Chromosomes? Most chromosomes come in matched pairs, one from each parent. These are autosomes , and both copies carry the same genes. Sex chromosomes are different. They are the chromosomes that differ between males and females of a species and typically carry or are linked to the gene that triggers sex determination. In XY systems (mammals, many insects, some fish), males carry one X and one Y, while females carry two Xs. The Y is typically small and gene-poor. In ZW systems (birds, snakes, butterflies), it is the female who is the heterogametic sex, carrying one Z and one W, while males have two Zs. In X0 systems , the Y has been lost entirely: males have a single X and no partner for it. A crucial insight: sex chromosomes are not a fixed feature of life . They have evolved independently hundreds of times across the tree of life. Every pair of sex chromosomes started as an ordinary pair of autosomes. But sex determination goes far beyond chromosomes. Some organisms use environmental cues (temperature in many reptiles), haplodiploidy (ploidy level determines sex in bees and ants), or vastly more complex genetic architectures. Understanding this diversity is key to understanding why sex chromosomes evolve the way they do. ### Diversity of Sex Determination ### Beyond XY and ZW, The Wild Diversity #### UV Sex Chromosomes In some algae and bryophytes (liverworts, mosses), sex is determined in the haploid phase of the life cycle. After meiosis, spores carry either a U chromosome (producing female gametophytes) or a V chromosome (producing male gametophytes). This is fundamentally different from XY/ZW systems because selection on sex-linked genes operates on haploid individuals, there is no "heterozygous shelter" for deleterious mutations. The liverwort Marchantia polymorpha and the green alga Volvox are model systems for studying UV chromosomes. #### Fungal Mating-Type Chromosomes Fungi push the boundaries of what "sex chromosomes" can mean. The mushroom Schizophyllum commune has over 23,000 mating types , controlled by two unlinked loci with hundreds of alleles each. Meanwhile, Cryptococcus neoformans and Ustilago maydis have large mating-type chromosomes with suppressed recombination that show striking parallels to animal and plant sex chromosomes, including degeneration, gene loss, and accumulation of transposable elements. These independently evolved systems demonstrate that the same evolutionary forces shape sex-linked genomic regions across all of life. #### Haplodiploidy In Hymenoptera (ants, bees, wasps) and some other arthropods, there are no sex chromosomes at all . Males develop from unfertilized eggs and are haploid; females develop from fertilized eggs and are diploid. Ploidy is sex determination. This system has profound consequences for the evolution of genome architecture and social behavior, it means that sisters share 75% of their genes, which Hamilton argued was a key driver of eusociality. #### Environmental Sex Determination Many reptiles (most turtles, all crocodilians) determine sex by the temperature experienced during embryonic development. Some fish change sex in response to social cues, clownfish are protandrous hermaphrodites, meaning the dominant male in a group changes to female if the breeding female dies. Transitions between genetic and environmental sex determination happen repeatedly across the tree of life, and understanding why organisms switch is an active area of research. ### The Same Forces, Again and Again The remarkable thing about sex-linked genomic regions is how convergent their evolution is. Whether we look at animal X/Y chromosomes, plant sex chromosomes, fungal mating-type regions, or algal U/V chromosomes, we see the same features evolve independently: recombination suppression , degeneration of the heterogametic chromosome , and accumulation of repetitive elements . This convergence tells us that the evolutionary forces shaping these regions are powerful and predictable. ### From Autosomes to Sex Chromosomes Every pair of sex chromosomes began as an ordinary pair of autosomes. The transformation unfolds in stages: Step 1: A sex-determining gene appears. A mutation arises on one copy of an autosome that triggers male or female development. This could be a novel gene, a translocated gene, or a regulatory change. Now one homolog carries a sex-determiner and the other does not, this is the birth of a proto-sex chromosome. Step 2: Sexually antagonistic alleles accumulate. Genes that benefit one sex but harm the other ("sexually antagonistic" genes) are favored near the sex-determiner. A male-benefit allele, for example, is advantageous when linked to the male-determining gene because it will always be in males. Step 3: Recombination is suppressed. Selection favors chromosomal inversions or other rearrangements that prevent the sex-determiner from recombining away from the sexually antagonistic alleles. The non-recombining region expands, sometimes in discrete "evolutionary strata." Step 4: The proto-Y degenerates. Without recombination to purge deleterious mutations, the proto-Y chromosome accumulates genetic damage. Genes are lost, repetitive elements invade, and the chromosome physically shrinks. #### Alternative Models: Not Just Sexually Antagonistic Selection The classic model, sexually antagonistic selection drives recombination suppression, has been the dominant framework since Rice (1987) . But it is not the only game in town. Thomas Lenormand and colleagues have developed models showing that recombination suppression can spread through other mechanisms, including the sheltering of deleterious recessive alleles and regulatory degeneration. In their models, expansion of the non-recombining region may be driven by drift or neutral processes as much as by selection. This is an active and genuinely unresolved debate in the field. ### The Theoretical Foundation ### Why the Y (or W) Degenerates Once recombination stops, a chromosome is on a one-way path toward decay. Several mutually reinforcing processes drive this: Muller's ratchet, In small populations, the class of chromosomes with the fewest deleterious mutations can be lost by drift. Without recombination to recreate the least-loaded class, the ratchet clicks forward: the minimum number of mutations on the chromosome only ever increases. Background selection, Purifying selection against deleterious alleles on the non-recombining chromosome reduces the effective population size of the entire chromosome, making it more vulnerable to drift. Genetic hitchhiking, When a beneficial mutation sweeps to fixation on the Y, it drags along any linked deleterious alleles. Without recombination, the good and the bad travel together. Hill-Robertson interference, The general phenomenon: selection at one locus interferes with selection at linked loci. On a non-recombining chromosome, every gene interferes with every other gene. The result: gene loss, accumulation of transposable elements, heterochromatinization, and physical shrinkage. The human Y retains only ~55 protein-coding genes, down from the ~800+ on the X. #### The Drift vs. Selection Debate Brian Charlesworth's models emphasize deterministic forces: background selection and hitchhiking are powerful enough to drive degeneration even in large populations. Others argue that drift plays a larger role, especially in species with small effective population sizes. The relative contribution of each force remains one of the genuinely contested questions in sex chromosome biology. ### Testing the Fragile Y ### Finding Sex Chromosomes in Genome Assemblies #### Coverage-Based Identification The simplest and most powerful approach: sequence both males and females, then map their reads to the genome assembly. X-linked scaffolds will have approximately half the read depth in males (who have one X) compared to females (who have two). Y-linked scaffolds will have reads from males but zero coverage from females . Autosomes will show equal coverage in both sexes. #### Expression-Based Identification Compare gene expression (RNA-seq) between males and females across scaffolds. X-linked genes will show characteristic expression patterns that depend on whether dosage compensation exists. Without compensation, X-linked genes are expressed at roughly half the level in males compared to females. With compensation, expression is equalized, but the mechanism of equalization differs among taxa (see Section 6). #### Additional Approaches Heterozygosity : Females (XX) will be heterozygous on X-linked loci, while males (XY) will be hemizygous, meaning variant calls from males on X-linked scaffolds will look homozygous. K-mer methods : male-specific k-mers identify Y-linked sequences. Synteny : comparing to related species with known sex chromosomes can identify conserved sex-linked regions. ### Example Plots and Patterns ### Dosage Compensation When one sex has a single copy of a large, gene-rich chromosome and the other sex has two, there is a dosage problem . X-linked genes in males (XY) are expressed from one copy, while autosomal genes are expressed from two. This imbalance disrupts the stoichiometry of protein complexes and regulatory networks. Many organisms have evolved mechanisms to equalize, or compensate, this dosage difference. #### Three Classic Mechanisms #### Compensation Is Far From Universal A major insight from the last two decades of comparative work: dosage compensation is not universal . Many organisms with differentiated sex chromosomes show incomplete or no dosage compensation. Birds (ZW) lack a chromosome-wide compensation mechanism, Z-linked genes are simply expressed at higher levels in males (ZZ) than females (ZW). Snakes show partial compensation that varies across the chromosome. Lepidoptera are similar. This means that sex-biased gene expression is pervasive in these lineages, with profound implications for sexual dimorphism, disease, and adaptation. ### Different Solutions in Different Lineages ### Fusions, Turnovers, and Neo-Sex Chromosomes Sex chromosomes are not static. They undergo turnovers, a new sex-determining gene arises on a different chromosome, and the old sex chromosomes revert to behaving as autosomes. They also undergo fusions, an autosome fuses to an existing sex chromosome, creating a "neo" sex chromosome with both old sex-linked genes and newly sex-linked autosomal genes. #### Neo-Sex Chromosomes When an autosome fuses to a Y chromosome, the fused portion becomes a neo-Y , and its free homolog becomes a neo-X . The neo-Y portion is now non-recombining (at least in the fused region) and will begin to degenerate, giving us a window into the early stages of sex chromosome evolution happening in real time. Species with neo-sex chromosomes are invaluable natural experiments. #### What Determines Whether a Fusion Spreads? Not all fusions to sex chromosomes are equal. Fusions involving the pseudoautosomal region (PAR), the small region where the X and Y still recombine, have very different dynamics than fusions to the non-PAR portion. The probability that a fusion becomes established in a population depends on meiotic mechanics, selection, drift, and the specific location of the fusion breakpoint. ### The Mechanics of Sex Chromosome Change ### Why Study Sex Chromosome Evolution? #### Speciation Sex chromosomes play an outsized role in reproductive isolation between species. Haldane's rule, the observation that when hybrids are inviable or sterile, it is the heterogametic sex (XY or ZW) that is affected first, points directly to the special role of sex chromosomes in speciation. The large-X effect (or large-Z effect) means that genes on sex chromosomes contribute disproportionately to hybrid incompatibility. Understanding how sex chromosomes evolve is essential to understanding how species form. #### Human Disease Many genetic disorders are X-linked : hemophilia, Duchenne muscular dystrophy, red-green color blindness, fragile X syndrome. Because males have only one X, they lack a backup copy, recessive mutations on the X are always expressed in males. Understanding why particular genes ended up on the sex chromosomes, and how dosage compensation works (and fails), has direct medical relevance. #### Sexual Dimorphism Sex chromosomes are a genomic reservoir for sexually antagonistic variation, genes that benefit one sex at the cost of the other. The genomic architecture of sex determination shapes how much males and females can differ in morphology, behavior, and physiology. The evolution of sex chromosomes is inextricable from the evolution of sex differences. #### Convergent Evolution The same suite of features, recombination suppression, degeneration, dosage compensation, accumulation of repetitive elements, evolves independently in animals, plants, fungi, and algae. This convergence reveals deep rules of genome evolution that transcend any particular lineage. Sex chromosomes are a window into what happens whenever a region of a genome stops recombining. ### Open Questions Some of the major questions driving current research in the Blackmon Lab and the field: #### Explore Further ## Evolution of Genome Structure - Blackmon Lab URL: https://coleoguy.github.io/genome-structure-evolution.html Description: A visual guide to the evolution of gross genome structure, chromosome fusions, fissions, whole-genome duplication, and the forces that shape karyotype evolution across the tree of life. # The Evolution of Genome Structure ### What Is Genome Structure? Every species has a genome, the complete set of DNA encoding its biology. But genomes are not just sequences of bases. They are physically organized into chromosomes : discrete, linear (or circular) packages of DNA bound to proteins. The karyotype of a species describes how many chromosomes it has and what they look like, their number, size, shape, and banding patterns. Chromosome number varies enormously across life. The fern Ophioglossum reticulatum holds the record at 2n = 1,260 . The jack jumper ant Myrmecia pilosula has 2n = 2, a single pair. Most mammals cluster around 2n = 40–60, while most birds sit near 2n = 80. These are not random numbers. They reflect hundreds of millions of years of mutations that fused chromosomes together, split them apart, or duplicated entire genomes. The central questions of genome structure evolution are deceptively simple: Why does chromosome number change? What forces favor fusions over fissions? Why do some clades evolve rapidly while others barely change? Answering them requires integrating cytogenetics, population genetics, phylogenetic comparative methods, and genomics. ### Scale of Karyotype Diversity Surprisingly, we still lack good answers to basic questions. We know how chromosomes fuse and split, but we often cannot explain why a particular lineage has the number it does. The field has oscillated between viewing karyotype change as a driver of speciation and viewing it as selectively neutral background noise. ### Chromosome Fusions A fusion joins two chromosomes into one, reducing the chromosome number by one. There are two main types: #### Robertsonian Translocations (Centric Fusions) Two acrocentric chromosomes (with the centromere near one end) fuse at their centromeres, producing a single metacentric chromosome (centromere in the middle). One centromere is typically lost or inactivated. This is the most common type of fusion in mammals, human chromosome 2 is an end-to-end (telomeric) fusion of two ancestral ape chromosomes, which is why humans have 2n = 46 while the other great apes have 2n = 48. #### Tandem (End-to-End) Fusions One chromosome attaches to the end (telomere) of another. Less common than Robertsonian translocations in mammals but important in insects and other groups. The resulting chromosome retains both centromeres initially, though one is usually silenced. The critical challenge: a new fusion starts in a single individual . That individual is heterozygous, one copy of the fused chromosome and two unfused copies. During meiosis, these three chromosomes must pair as a trivalent , which can missegregate. If the heterozygote has reduced fertility ( underdominance ), how does the fusion ever spread through the population? ### Underdominance and Fixation Russell Lande's 1979 paper was a watershed. Using diffusion theory, Lande showed that underdominant rearrangements face severe barriers to fixation in large populations, the probability of fixation is exponentially small unless the population is very small or the heterozygote disadvantage is very mild. This result seemed to doom chromosomal speciation theory: if fusions can barely fix, how can they drive speciation? Possible solutions to the fixation problem: ### Chromosome Fissions A fission splits one chromosome into two, increasing the chromosome number. This requires that the broken chromosome somehow acquires a new centromere (neocentromere) and new telomeres at the break points, otherwise the fragments will be lost during cell division. #### How Fissions Work Fissions are mechanistically more challenging than fusions. A fusion simply joins existing chromosomes; a fission must create new functional elements. Neocentromeres can arise from latent centromeric sequences or through epigenetic activation of non-centromeric DNA. Telomeres can be added de novo by telomerase or by recombination-based mechanisms. Despite these hurdles, fissions are clearly common, many lineages show net increases in chromosome number over evolutionary time. #### Fissions and Chromosome Shape A metacentric chromosome (centromere in the middle) can fission at the centromere to produce two acrocentric chromosomes. This is effectively the reverse of a Robertsonian fusion . The interplay between fusions and fissions, combined with changes in centromere position (pericentric inversions), determines the overall shape distribution of karyotypes. ### The Fusion–Fission Balance In most animal groups, fusions appear to be more common than fissions. The net direction of chromosome number change, whether a clade tends to fuse or fission over time, varies enormously and is one of the key observables that models of karyotype evolution try to explain. Some patterns are striking: ### Whole-Genome Duplication (Polyploidy) Polyploidy, the duplication of the entire genome, is the most dramatic change in genome structure. In a single event, chromosome number doubles. The new polyploid has twice as many chromosomes as its progenitor, with duplicate copies of every gene. #### Autopolyploidy vs. Allopolyploidy Autopolyploidy results from genome doubling within a single species (e.g., a failure of meiosis produces an unreduced gamete). The resulting organism has four copies of each chromosome. Allopolyploidy combines genome duplication with hybridization: two different species hybridize, and the hybrid undergoes genome doubling. Allopolyploids are often immediately reproductively isolated from both parents, making polyploidy a potential instantaneous speciation mechanism. #### Polyploidy in Plants vs. Animals Polyploidy is common in plants, perhaps 30–70% of flowering plant species are polyploid or have polyploid ancestry. In animals, it is much rarer, largely restricted to parthenogenetic lineages and a few sexually reproducing groups (some fish, frogs, insects). The sex chromosome "poison pill" hypothesis suggests that polyploidy disrupts sex determination in species with differentiated sex chromosomes, acting as a barrier to polyploidy establishment in most animals. ### The Polyploidy Paradox For decades, polyploidy was assumed to be an evolutionary "jackpot", instant gene duplication, instant reproductive isolation, and a burst of evolutionary novelty. Then Itay Mayrose and colleagues dropped a bomb in 2011: using phylogenetic methods, they showed that recently formed polyploid lineages actually diversify more slowly than their diploid relatives. Polyploidy may cause a brief burst of speciation followed by elevated extinction, an "evolutionary dead end" (or at least a speed bump). This result remains contested. The methodological assumptions have been questioned, and some plant clades clearly show polyploidy-associated radiations. The truth is likely nuanced: polyploidy may facilitate adaptation in some ecological contexts while being a liability in others. ### Meiotic Drive and Karyotype Evolution Standard population genetics assumes that each allele (or chromosome) has a 50% chance of being transmitted to the next generation through fair Mendelian segregation. Meiotic drive violates this assumption: some chromosomes are preferentially transmitted, gaining a segregation advantage independent of their effects on organismal fitness. #### Female Meiotic Drive In female meiosis (of most animals), only one of four meiotic products becomes the egg; the other three become polar bodies that are discarded. If a chromosome can preferentially orient itself toward the egg pole of the meiotic spindle, it will be transmitted at >50%. This is centromere-mediated drive , and it may be one of the most important forces shaping karyotype evolution. The key insight from de Villena & Sapienza (2001) : asymmetric female meiosis creates an arena for centromeric competition. Centromeres that bind more kinetochore proteins may "win" the competition for the egg pole. This means that the centromere itself is under selection, and changes in chromosome structure that affect centromere behavior (fusions, fissions) may be favored or disfavored by drive, regardless of their effects on the organism. ### Evidence for Meiotic Drive in Karyotype Evolution If meiotic drive influences which chromosome configurations are preferentially transmitted, it should leave detectable signatures in comparative data. The Blackmon Lab tested this directly in mammals. The key finding: in mammals, chromosome number evolution is biased toward fusions in a pattern consistent with female meiotic drive favoring metacentric chromosomes. This provides a mechanistic explanation for why mammalian karyotypes tend to evolve toward lower chromosome numbers. ### Rates of Karyotype Evolution Span Seven Orders of Magnitude For decades, a common assumption held that chromosome number is evolutionarily conservative, that most lineages sit near a stable karyotype, punctuated by rare bursts of change. A massive new analysis shatters this view. Copeland et al. (submitted) compiled 63,682 karyotypes across 56 eukaryotic clades, spanning animals, plants, fungi, and more, and estimated rates of dysploidy and polyploidy on each clade's phylogeny using chromePlus. The result: rates of chromosome number evolution vary by seven orders of magnitude across the tree of life. #### Stasis Is Rarer Than You Think Perhaps the most surprising finding is that groups traditionally considered "static" are anything but. All three bird orders analyzed ( Accipitriformes , Passeriformes , and Galliformes ) exceed the global median rate of chromosome number change. The appearance of stasis in birds, most species near 2n ≈ 80, masks genuinely active evolutionary dynamics. #### Ecology Trumps Architecture A longstanding hypothesis predicted that chromosome architecture should dictate rates: holocentric chromosomes (where the centromere spans the entire chromosome) should tolerate rearrangements more readily than monocentric ones. The data say otherwise. Orchidaceae (monocentric) evolve chromosome number ~34× faster than Odonata (holocentric). This pattern, replicated across the dataset, argues that ecology and life history matter more than chromosome structure for setting the tempo of karyotype evolution. #### Rate Variation Is Kingdom-Agnostic Fast and slow clades are found in every kingdom. The slowest clade in the dataset is an animal (Cetacea); the fastest is a plant (Asteraceae). Insects, vertebrates, angiosperms, and fungi all span the full range. There is no "animal rate" or "plant rate", the forces shaping karyotype tempo are universal. ### Total Dysploidy Rate Across Eukaryotes Posterior median rates estimated via chromePlus MCMC on trees scaled to unit depth. Dashed line = global median. Data from Copeland et al. (submitted). Clades colored by higher classification. ### Modeling Chromosome Evolution The modern study of karyotype evolution is built on probabilistic models fitted to phylogenies. Instead of simply counting chromosome numbers and drawing arrows, we can now estimate rates of fusion, fission, and polyploidy, test whether these rates differ between lineages, and ask whether traits (like sex determination system or ecology) influence the pace of karyotype change. #### ChromEvol Developed by Itay Mayrose , ChromEvol fits continuous-time Markov models of chromosome number change to a phylogeny with observed tip counts. It estimates rates of ascending dysploidy (gains of individual chromosomes), descending dysploidy (losses), and polyploidy (whole-genome duplication). This was the first framework to bring statistical rigor to the field. #### chromePlus Developed in the Blackmon Lab , chromePlus extends ChromEvol by allowing a binary trait (e.g., sex determination system, life history strategy, ecological niche) to modulate the rates of dysploidy and polyploidy. This lets you ask: does having sex chromosomes speed up or slow down karyotype evolution? Do island species evolve faster than mainland relatives? #### ChromoSSE / BiChroM Rosana Zenil-Ferguson took the next step by coupling chromosome number evolution to diversification dynamics . Her models allow karyotype change to influence speciation and extinction rates (and vice versa), testing whether chromosome evolution is a cause or consequence of clade diversification. ### Pushing the Methods Forward ### What We Do Not Know For all the progress of the last two decades, the field of genome structure evolution is remarkably honest about its open questions . Several fundamental problems remain stubbornly unresolved: #### Is Karyotype Change Adaptive? We still do not know whether most chromosome fusions and fissions are selectively favored, neutral, or slightly deleterious . The population genetic models assume underdominance, but the actual fitness effects of structural rearrangements in natural populations are almost never measured directly. It is entirely possible that many rearrangements are effectively neutral once they fix. #### What Is the Role of 3D Genome Organization? Chromosomes are not randomly arranged in the nucleus. They occupy territories , fold into topologically associating domains (TADs), and interact through long-range regulatory elements . Rearrangements that disrupt these structures could have large fitness effects. But we have almost no comparative data on 3D genome organization across species with different karyotypes. #### Why Do Models Fail? Model adequacy is an increasingly recognized problem. Our probabilistic models of chromosome number change are simple Markov processes. They assume constant rates (or at best trait-dependent rates) across entire clades. But chromosome evolution is clearly episodic, context-dependent, and mechanistically complex. Whether our models capture the relevant biology is an open and important question. #### Can We Predict Karyotype Evolution? Given a species' phylogenetic position, ecology, life history, and meiotic system, can we predict its karyotype? Currently, no . This is humbling. It suggests that important forces remain unidentified or that stochastic processes dominate in ways our deterministic models cannot capture. ### Where the Field Is Going Several emerging directions offer hope: Long-read sequencing is finally delivering chromosome-level assemblies across hundreds of species (Earth BioGenome Project, Vertebrate Genomes Project, Darwin Tree of Life). For the first time, we will be able to study structural rearrangements at sequence-level resolution across the tree of life, rather than relying solely on cytogenetic observations. Centromere biology is undergoing a revolution. New sequencing technologies can now read through the repetitive satellite DNA that constitutes centromeres, allowing us to study centromere evolution directly. Understanding how centromeres evolve is crucial to understanding meiotic drive and its role in karyotype evolution. Integration with functional genomics, Hi-C, ATAC-seq, and other chromatin-level assays across species will reveal whether rearrangements disrupt functional genome architecture, providing the missing link between structural change and fitness. #### Explore Further ## Selection in Evolution, Blackmon Lab URL: https://coleoguy.github.io/selection.html Description: A visual guide to selection, natural, artificial, sexual, background, and indirect selection on recombination modifiers. From Darwin to modern population genetics. # Selection in Evolution ### What Is Selection? Selection is differential survival and reproduction based on heritable phenotypic variation. It is the only evolutionary force that consistently produces adaptation, the fit between organism and environment that pervades the living world. While mutation generates variation, recombination shuffles it, and drift randomly eliminates it, only selection can build complex adaptations over time. Darwin's insight was deceptively simple: variation exists , much of it is heritable , and not all individuals reproduce equally . The result is cumulative, directional change in the composition of populations. Individuals that happen to carry traits that improve their survival or reproductive success leave more offspring, and those offspring inherit the very traits that gave their parents an advantage. #### The Mathematics of Selection For a single locus with two alleles under additive (semi-dominant) selection, the change in frequency of the favored allele per generation is approximately: where s is the selection coefficient and p is the allele frequency. The general form, Δp = sp(1−p)[ph + (1−p)(1−h)] / w̄, accounts for dominance ( h ) and mean fitness (w̄). But real selection is far more complex. The breeder's equation captures the response to selection on a quantitative trait: where R is the response (change in mean phenotype), h² is heritability, and S is the selection differential. This deceptively simple equation encodes a profound truth: the rate of evolutionary change depends on both the strength of selection and the amount of heritable variation available. Selection acts on phenotypes , not genotypes directly. The mapping from genotype to phenotype to fitness is where all the complexity lives, pleiotropy, epistasis, genotype-by-environment interactions, and developmental constraints all mediate how genotypic variation translates into fitness differences. #### Types of Selection Directional selection favors one phenotypic extreme, shifting the mean. Stabilizing selection favors the mean, reducing variance. Disruptive selection favors both extremes, increasing variance and potentially driving speciation. Frequency-dependent selection changes in direction depending on how common a phenotype is, negative frequency dependence maintains diversity, positive frequency dependence eliminates it. Density-dependent selection varies with population density, often favoring different life-history strategies at high vs. low density. ### The Intellectual Lineage The theory of selection has been built by generations of thinkers, from Darwin's original verbal argument to the rigorous mathematical framework of population genetics. Each step clarified how selection acts, how it interacts with other forces, and how it produces the patterns we observe in living organisms. ### Drift vs. Selection, The Role of N e Whether selection or drift dominates at a given locus depends on the effective population size (N e ) and the selection coefficient (s). The critical threshold is: When |N e s| >> 1, selection dominates: beneficial alleles spread to fixation, deleterious alleles are purged. When |N e s| < < 1, drift dominates: alleles behave as if effectively neutral, regardless of their actual fitness effects. Their fate becomes a random walk governed by sampling variance in small populations. This means that what counts as "neutral" depends on population size. In large populations (bacteria with very large N e , insect species with huge census sizes), even very weakly selected alleles, those with s = 10 −7, are visible to selection. In small populations (endangered species, island endemics, organisms that have passed through severe bottlenecks), even moderately deleterious mutations with s = 10 −3 can drift to fixation as if they were neutral. #### The Nearly Neutral Theory Ohta (1973) recognized that most new mutations are not strictly neutral but slightly deleterious . Their evolutionary fate depends critically on N e . As N e decreases, the boundary of effective neutrality expands, and more slightly deleterious mutations drift to fixation. This leads to genomic decay, an accumulation of mildly harmful substitutions in small populations. #### Implications for Genome Evolution Small-N e lineages accumulate more slightly deleterious mutations, exhibit weaker codon usage bias, harbor more pseudogenes, and potentially experience faster rates of chromosome rearrangement fixation. Lynch (2007) argued powerfully that most of genome architecture, intron proliferation, genome size expansion, mobile element accumulation, is driven not by adaptation but by the inability of selection to prevent genomic bloat in organisms with small effective population sizes. ### The Drift–Selection Balance Computed from the Kimura (1962) diffusion approximation for a new mutation in a diploid population. For beneficial mutations with large N e , fixation probability approaches ~2s. ### Sexual Selection and Sexual Antagonism Sexual selection arises from variation in mating success rather than survival. Darwin recognized two distinct mechanisms: intrasexual selection (competition among members of one sex, typically males, for access to mates) and intersexual selection (mate choice, typically by females, favoring particular traits in the other sex). Sexual selection can drive the rapid evolution of extreme traits, peacock tails, beetle horns, bird song complexity, elaborate courtship dances. These traits may reduce survival but increase mating success. The tension between natural and sexual selection produces some of the most dramatic phenotypes in nature. #### Sexual Antagonism Sexual antagonism occurs when alleles benefit one sex but harm the other. This conflict is pervasive because the same genome must produce both males and females, yet the optimal phenotype for each sex is often different. Traits that maximize male fitness (large body size, aggression, ornamentation) may reduce female fitness, and vice versa. Intralocus sexual conflict occurs when a single locus is under opposing selection in males and females. Resolution mechanisms include: (1) sex-limited expression, the allele is expressed only in the sex it benefits, (2) gene duplication and divergence, each sex uses a different copy, or (3) movement to sex chromosomes, the allele becomes X- or Y-linked where it spends more or less time in one sex. #### Sex Chromosomes as Resolution Sexually antagonistic alleles benefit from sex-linkage. A female-beneficial allele gains an advantage by being X-linked (it spends two-thirds of its time in females in an XY system). A male-beneficial allele gains from being Y-linked (exclusively male-transmitted). This creates selection to expand sex-linked regions and is a major force driving the evolution of sex chromosomes, a topic the Blackmon Lab studies extensively. Intersexual selection can maintain genetic variation through the genic capture model: if female choice targets condition-dependent traits, then many loci throughout the genome, all those that affect organismal condition, become indirect targets of sexual selection. ### Sexual Selection in Theory and Practice ### Background Selection and Linked Selection Selection at one locus affects allele frequencies at linked loci . This “linked selection” has profound consequences for genome evolution and explains many patterns that cannot be understood by looking at individual loci in isolation. #### Background Selection Charlesworth, Morgan, and Charlesworth (1993) formalized background selection : purifying selection against deleterious mutations at linked loci reduces the effective population size (N e ) at nearby neutral sites. The stronger the purifying selection and the lower the recombination rate, the greater the reduction. Regions of the genome with little recombination, near centromeres, on sex chromosomes, within inversions, have reduced diversity, reduced efficacy of selection, and faster accumulation of deleterious mutations. #### Genetic Hitchhiking Maynard Smith and Haigh (1974) described genetic hitchhiking : when a beneficial mutation sweeps to fixation, it drags along linked neutral (and even slightly deleterious) variants, creating a selective sweep, a region of drastically reduced diversity surrounding the selected site. Sweeps leave a characteristic molecular signature: reduced heterozygosity, an excess of rare alleles, and elevated linkage disequilibrium. These signatures allow us to detect recent positive selection in genome scans. #### Hill-Robertson Interference Hill-Robertson interference occurs when selection at multiple linked loci interferes with itself. Beneficial alleles at one locus may be linked to deleterious alleles at another, preventing either from reaching its optimal frequency. This mutual interference reduces the overall efficacy of selection and is one of the most important theoretical justifications for the evolution of recombination. #### Consequences Together, these linked selection effects explain: (1) why diversity correlates with recombination rate across the genome, (2) why non-recombining regions (Y chromosomes, inversions) degenerate , (3) why recombination itself evolves, any modifier that increases recombination can be favored because it breaks up the negative associations created by Hill-Robertson interference. ### The Papers That Built the Framework Simulated/illustrative data showing expected diversity patterns; not empirical observations. ### Indirect Selection, Recombination Modifiers Some of the most consequential selection in genome evolution is indirect : it acts not on the phenotype produced by a mutation, but on the effect that mutation has on linkage relationships among other genes. #### Inversions and Supergenes A chromosomal inversion captures a block of genes in a non-recombining unit. If the captured block includes locally adapted alleles or sexually antagonistic alleles, the inversion can be favored by indirect selection, not because the inversion itself is beneficial, but because it preserves beneficial gene combinations that would be broken up by recombination in the standard arrangement. This is the theoretical basis for supergenes, regions of suppressed recombination that maintain co-adapted allele complexes. Spectacular examples include: Heliconius butterfly wing pattern mimicry (controlled by inversions on a single chromosome), white-throated sparrow behavioral morphs (a massive inversion on chromosome 2), and fire ant social chromosomes (a supergene determines whether colonies have one or multiple queens). #### Chromosome Fusions Chromosome fusions change the recombination landscape. A fusion between an autosome and a sex chromosome creates a neo-sex chromosome , instantly altering which genes are sex-linked and changing the selective regime for those genes. The Blackmon Lab has shown that fusions to sex chromosomes are not random, fusions to the non-PAR (pseudoautosomal region) portions of sex chromosomes are favored differently than fusions to PAR regions, with important implications for the rate and direction of sex chromosome evolution. #### Recombination Rate Evolution The recombination rate itself evolves. Modifiers that increase or decrease recombination can be favored depending on the fitness landscape. In a rugged epistatic landscape , reduced recombination can preserve good gene combinations. In a changing environment , increased recombination generates the novel combinations needed to adapt. The tension between these forces shapes recombination rate variation both within and between species. ### Indirect Selection in Action Simulated trajectories illustrating frequency-dependent selection dynamics; not empirical data. ### Artificial Selection and Domestication Artificial selection applies the same evolutionary principles, heritable variation + differential reproduction = evolutionary change, but with human preference replacing ecological fitness as the selective environment. Darwin began the Origin with artificial selection precisely because it made the mechanism of selection tangible and undeniable. #### Domestication as Evolutionary Experiment Domesticated species have experienced intense selection on specific traits, yield, behavior, morphology, color, for hundreds to thousands of generations. This creates dramatic phenotypic change while revealing the genetic architecture of adaptation. Key insights from domestication include: (1) Response can be rapid and dramatic. The difference between a wolf and a Chihuahua, between red junglefowl and a modern broiler chicken, between wild teosinte and modern maize, all achieved in a few thousand generations or less. (2) Correlated responses reveal genetic architecture. Selecting for tameness in foxes produced floppy ears, curly tails, and piebald coats, the "domestication syndrome." These correlated responses reveal pleiotropy and genetic correlations that constrain and channel evolutionary change. (3) Domestication bottlenecks reduce diversity. Most domesticated species have dramatically reduced genetic diversity compared to their wild ancestors, making them vulnerable to novel diseases and environmental changes. (4) Relaxed selection reveals costs. When natural selection on anti-predator behavior, immune function, or other wild-type traits is relaxed, those traits degrade, showing that they were maintained by ongoing selection in the wild. #### Lab Study Systems The Blackmon Lab uses domesticated species as models for understanding how selection reshapes genomes. Betta fish ( Betta splendens ) have been selectively bred for centuries for coloration, fin morphology, and aggression, creating extreme phenotypic diversity from a single wild ancestor. Chickens , domesticated from red junglefowl, are now the world's most abundant bird and a model for studying rapid adaptation under intense artificial selection. ### Learning from What We've Bred The Blackmon Lab asks: "How does domestication impact organisms? What can we learn about adaptation and radiation from studying domestication?" Domesticated species offer tractable systems where the selection history is at least partially known, allowing us to connect genotype to phenotype to fitness in ways that are difficult in wild populations. Simulated response to truncation selection under the breeder's equation; illustrative, not empirical data. ### Selection and Speciation Selection drives speciation in multiple ways, each involving the buildup of reproductive isolation between diverging populations. #### Ecological Speciation Divergent natural selection in different environments creates reproductive isolation as a byproduct of adaptation. Populations adapting to different ecological niches, different food sources, different habitats, different climates, accumulate genetic differences that reduce hybrid fitness. The key insight is that reproductive isolation is not the target of selection but an incidental consequence of adaptation to different environments. #### Sexual Selection and Speciation Divergent mate preferences can drive rapid reproductive isolation even without ecological divergence . Lande's (1981) models showed that Fisherian runaway processes, where female preferences and male ornaments coevolve in a positive feedback loop, can cause rapid, arbitrary divergence in mating signals between populations, leading to prezygotic isolation. #### Reinforcement When hybrids are less fit, selection favors increased assortative mating, individuals preferring mates from their own population. This strengthens reproductive barriers in sympatry, a process called reinforcement or the "Wallace effect." Reinforcement completes the speciation process by converting partial barriers into complete ones. #### Dobzhansky-Muller Incompatibilities Independently derived mutations that function well in their home genetic background but interact negatively in hybrids create Dobzhansky-Muller incompatibilities . This is epistatic selection against hybrid genotypes: allele A from population 1 and allele B from population 2 have never been tested together, and when combined in a hybrid, they fail. These incompatibilities accumulate roughly as the square of divergence time (the "snowball" effect). #### Haldane's Rule Haldane's rule : in crosses between species, the heterogametic sex (XY males or ZW females) is more often inviable or infertile. Multiple explanations contribute: the dominance theory (recessive incompatibilities on the X are exposed in hemizygous males), faster-X evolution (hemizygous selection accelerates divergence of X-linked genes), and meiotic drive (selfish genetic elements on sex chromosomes cause hybrid dysfunction). #### Genomic Islands of Speciation In the face of gene flow, selection creates "genomic islands of speciation", regions that resist introgression because they contain locally adapted or incompatible alleles, while the rest of the genome freely introgresses. These islands are often associated with inversions or low-recombination regions, connecting speciation to the genome architecture themes discussed above. ### How Species Are Born ### The Big Picture, How Selection Shapes Genomes The genome is shaped by the interplay of all forms of selection simultaneously . No single force acts in isolation, and the patterns we observe in any genome are the cumulative result of millions of years of these forces interacting with each other and with the nonadaptive process of genetic drift. #### The Forces in Concert Natural selection maintains gene function and drives adaptation to the ecological environment. Sexual selection drives rapid divergence between species and sexual dimorphism within them. Background selection erodes diversity in low-recombination regions, reducing the efficacy of selection precisely where it is most needed. Indirect selection on recombination modifiers shapes genome architecture, recombination rates, chromosome number, inversion polymorphism. And the balance between drift and selection (N e s) determines which of these forces dominates at any given locus in any given species. #### Unresolved Questions (1) How much of the genome is under selection? Estimates range from less than 5% (only protein-coding regions) to over 80% (ENCODE functional annotations). The answer depends critically on what we mean by "functional" and on N e . (2) New mutations vs. standing variation? Does most adaptation come from new mutations that arise after the environment changes, or from pre-existing variation that was previously neutral or even slightly deleterious? The answer matters for predicting evolutionary responses to rapid environmental change. (3) Evolvability. How do organisms maintain the capacity to respond to future selection while being well-adapted now? Modularity, genetic redundancy, and the structure of genetic networks may all contribute to evolvability, but the degree to which evolvability itself is shaped by selection remains debated. (4) Selection on genome structure. How does selection on chromosome number, recombination landscape, and genome organization interact with selection on gene content? This is the core question of the Blackmon Lab's research program. ### The State of the Art #### Explore More ## Epistasis & the Shifting Balance, Blackmon Lab URL: https://coleoguy.github.io/epistasis-line-cross.html Description: A visual guide to epistasis, line cross analysis, and the shifting balance theory, from Wright and Fisher to modern quantitative genetics. # Epistasis & the Shifting Balance ### What Is Epistasis? Epistasis is the interaction between genes. More precisely, epistasis occurs when the phenotypic effect of an allele at one locus depends on the genotype at another locus. This is distinct from additive effects, where each allele contributes independently and predictably to the phenotype regardless of the genetic background. Why does this matter? If genetic architectures are primarily additive, populations will tend to evolve smoothly toward a single optimum, a process well described by Fisher's infinitesimal model. But if epistasis is common, populations can get "trapped" on local fitness peaks . The response to selection depends critically on genetic background, and the adaptive landscape becomes rugged and multi-peaked. #### Types of Epistasis Magnitude epistasis: The interaction changes the size of an allele's effect. An allele that increases body size by 2mm in one genetic background might increase it by only 0.5mm in another. The direction is the same, but the magnitude differs. Sign epistasis: The interaction reverses the direction of an allele's effect. An allele that is beneficial in one genetic background becomes deleterious in another. This is more consequential because it means that whether an allele is favored by selection depends on what else is in the genome. Reciprocal sign epistasis: Both alleles at the interacting loci reverse their effects depending on the other. This is the most important type for evolutionary theory because it is the necessary and sufficient condition for the existence of multiple fitness peaks. When reciprocal sign epistasis exists, there is no single-step mutational path from one peak to another that is always uphill, the population must cross a valley. ### The Papers That Launched a Field ### Wright vs. Fisher The most consequential intellectual conflict in 20th-century evolutionary genetics pitted two titans against each other: Ronald A. Fisher and Sewall Wright . Their disagreement was not about whether evolution occurs, but about how it works at the genetic level, and their differing views continue to shape the field today. #### Fisher's View Fisher argued that evolution is primarily driven by natural selection acting on additive genetic variance in large, panmictic (freely interbreeding) populations. In this framework, epistasis is statistical noise, it contributes little to the response to selection because recombination breaks up favorable gene combinations every generation. His Fundamental Theorem of Natural Selection states that the rate of increase in fitness equals the additive genetic variance in fitness. Evolution, in Fisher's view, is a smooth, deterministic, hill-climbing process. Drift is negligible in populations o... #### Wright's View Wright saw a fundamentally different picture. He argued that real populations are structured into small, partially isolated demes (subpopulations). In small demes, genetic drift is a powerful force that can push populations away from their current fitness peak. Gene interactions (epistasis) create multiple adaptive peaks in the fitness landscape. Wright proposed that drift, migration, and selection work together to allow populations to explore and ultimately find the highest peaks, his famous shifting balance theory . #### The Three Phases of the Shifting Balance Phase 1, Random drift: Within small demes, genetic drift causes random fluctuations in allele frequencies. This exploration of genotype space can move a population off its current local fitness peak and into the domain of attraction of a different peak. Phase 2, Mass selection within demes: Once drift has carried a deme into the basin of attraction of a new (possibly higher) peak, ordinary natural selection pushes the population uphill toward that new peak. Phase 3, Interdeme selection: Demes sitting on higher peaks have greater average fitness, producing more emigrants. These migrants carry favorable gene combinations to neighboring demes, pulling the entire metapopulation toward the global optimum. This is a form of group selection, mediated by differential migration. ### The Papers Note: This chart shows simulated data for illustrative purposes. ### Coyne, Barton, Turelli vs. Wade & Goodnight The Wright-Fisher debate was dramatically revived in the late 1990s, when Jerry Coyne, Nick Barton, and Michael Turelli published a provocative critique arguing that the shifting balance theory was unnecessary and unsupported by evidence. The response from Michael Wade and Charles Goodnight was equally forceful. What followed was one of the great intellectual exchanges in evolutionary biology. #### The Critique (Coyne, Barton, Turelli 1997) They attacked each phase of the shifting balance: Against Phase 1: Random drift is too weak to move populations off fitness peaks in populations of realistic size. The valley-crossing required is implausible unless populations are extremely small. Against Phase 3: Interdeme selection (the mechanism by which higher-fitness demes spread their gene combinations to lower-fitness demes) has never been convincingly demonstrated in nature. The required conditions are very specific and unlikely to be commonly met. The bottom line: Simpler models based on mass selection acting on additive variation in large populations can explain observed patterns of adaptation. The shifting balance is "of only minor importance in evolution." #### The Defense (Wade & Goodnight 1998) Wade and Goodnight argued that Coyne et al. had set up straw man versions of Wright's theory. Their key points: Tribolium experiments: Laboratory experiments with flour beetles demonstrated that population structure and epistasis interact to produce evolutionary outcomes that mass selection cannot . When populations were subdivided and subjected to group selection, they responded far more than predicted by additive models alone. Variance conversion: This is perhaps the most important theoretical insight. When populations go through bottlenecks (genetic drift changes allele frequencies), epistatic variance can be "converted" to additive variance . This happens because drift changes the genetic background against which other alleles are measured. An allele whose effect was previously masked by interactions becomes exposed when its interacting partners change in frequency. This means drift doesn't just add noise, it fundamentally changes the substrate on which selection acts . The resolution? The debate remains formally unresolved. But what is not debated is that epistasis is real and common. The question is: how much does it matter for adaptive evolution in nature? ### Four Papers, Two Perspectives ### Measuring Epistasis, Line Cross Analysis How do we actually detect and measure epistasis? One of the most powerful classical approaches is line cross analysis (LCA) . #### The Basic Idea Cross two divergent lines, these can be different populations, species, or artificially selected lines that differ in a trait of interest. Then create a series of composite generations : the F1 (first filial generation), F2 (second filial), backcrosses to each parent (BC1 and BC2), and potentially additional generations. Measure the trait in every generation. The pattern of generation means reveals the genetic architecture underlying the trait difference. #### What the Patterns Tell Us If inheritance is purely additive: The F1 mean falls exactly at the midparent value (halfway between P1 and P2). The F2 mean equals the F1 mean. Backcross means fall midway between the F1 and the respective parent. Everything is clean, predictable, and linear. If dominance is present: The F1 deviates from the midparent value (toward one parent or the other). But F2 still behaves predictably relative to the F1 and backcrosses. If epistasis is present: These simple expectations break down. F2 means deviate from F1 means. Backcross means are asymmetric. The specific pattern of deviations reveals which types of epistasis are at play: additive × additive ([aa]), additive × dominance ([ad] and [da]), or dominance × dominance ([dd]). #### The Traditional Approach The classical method is the joint-scaling test (Cavalli 1952, Hayman 1958, Mather & Jinks 1982). This involves fitting models of increasing complexity (additive only, additive + dominance, additive + dominance + epistasis) and testing whether each additional parameter significantly improves the fit. The problem: this relies on sequential hypothesis testing, which has well-known issues with statistical power, model selection bias, and the arbitrary nature of p-value thresholds. ### The Joint-Scaling Test Toggle between additive and epistatic models to see how gene interactions distort generation means from their expected values. In the epistatic model, note how the F2 deviates from the F1, and backcross means are pulled asymmetrically, these are the signatures of epistasis that line cross analysis detects. (Illustrative data, values are theoretical expectations, not from a specific experiment.) ### SAGA, A Better Approach to Line Cross Analysis The Blackmon Lab developed SAGA (Statistical Analysis of Genetic Architecture) to address the fundamental limitations of the traditional joint-scaling test. #### The Problem with Sequential Testing The traditional approach fits a sequence of models, additive, then additive + dominance, then additive + dominance + epistasis, and uses chi-squared tests to determine when to stop adding parameters. This has several well-known problems: Multiple testing: Each sequential test increases the overall false positive rate. The order in which parameters are tested matters, and different orders can give different conclusions. All-or-nothing: You either reject or fail to reject each model. There is no way to express uncertainty about which model is best, or to average across models when several are similarly supported. Lack of parsimony penalty: Chi-squared tests do not inherently penalize model complexity. A model with more parameters will always fit better, even if the additional parameters are capturing noise rather than signal. #### The SAGA Solution SAGA uses an information-theoretic approach based on AICc (corrected Akaike Information Criterion). Instead of asking "Is this model significantly better than the simpler one?" it asks "What is the relative support for each model given the data?" Key advantages: Simultaneous model comparison: All candidate genetic architecture models are evaluated at once, not sequentially. This includes all possible combinations of additive, dominance, and epistatic effects. Natural complexity penalty: AICc includes a penalty for the number of parameters, which increases when sample sizes are small. This automatically guards against overfitting. Model averaging: Instead of choosing a single "best" model, SAGA provides model-averaged parameter estimates weighted by the relative support for each model. This means your estimates of genetic effects incorporate model uncertainty. Confidence sets: SAGA produces a confidence set of models (e.g., the set of models within 2 AICc units of the best), giving you a clear picture of which architectures are plausible and which are not. #### SAGA2 The current implementation is the SAGA2 R package , freely available on GitHub. It extends the original SAGA framework to handle sexual dimorphism and genotype-by-environment interactions, allowing researchers to partition genetic architecture into even finer components. ### The Methods Papers Install: devtools::install_github("coleoguy/SAGA2") GitHub: github.com/coleoguy/SAGA2 Vignette: saga.pdf This chart illustrates how SAGA evaluates all candidate models simultaneously using AICc weights. The traditional approach (sequential testing) would select the first "non-significant" model and stop. SAGA instead provides the relative support for each architecture, revealing that multiple models may be similarly supported. (Illustrative data, values are simulated to demonstrate the method.) ### Wright Was Right The Blackmon Lab put Wright's ideas to the most comprehensive empirical test ever conducted. In Burch et al. 2024 , the lab analyzed over 1,600 line cross datasets spanning plants and animals, the largest survey of epistasis in the history of genetics. #### Key Findings Epistasis is pervasive. It was detected in the majority of crosses examined. This is not a marginal effect confined to a few unusual systems, it is a widespread feature of genetic architecture across the tree of life. The importance varies, but it is rarely absent. Different taxa and trait categories show different levels of epistasis, but purely additive models were rarely the best-supported architecture. Morphological traits, life history traits, physiological traits, and behavioral traits all showed substantial epistatic contributions. The patterns are consistent with Wright's vision. Complex genetic architectures where gene interactions matter are the norm, not the exception. The additive-variance-centric view of evolution, Fisher's view, captures an important part of the picture, but it is incomplete. #### Recognition This paper was selected for the 2025 Society for the Study of Evolution President's Award for Outstanding Dissertation Paper, one of SSE's most prestigious honors, recognizing the most impactful dissertation-based publications in the field. #### Implications If epistasis is truly pervasive, then several major conclusions follow: The response to selection is context-dependent. An allele that is beneficial in one population may be neutral or harmful in another, depending on the epistatic context. This complicates predictions about evolutionary trajectories. Population structure matters. If the fitness landscape is rugged (multi-peaked), then the size and connectivity of populations affects which peaks can be reached. Wright's emphasis on population structure was prescient. Additive variance is not the whole story. Fisher's fundamental theorem, while mathematically elegant, applies only to the additive component of fitness variation. If epistasis converts to additive variance through drift (as Wright and later Goodnight showed), then the interplay between drift and selection becomes crucial, exactly as Wright argued. ### The Largest Survey of Epistasis ### Beyond Line Crosses, Modern Approaches Line cross analysis is powerful but limited to systems where controlled crosses are possible. Modern genomics has opened entirely new windows on epistasis. #### GWAS Epistasis Scans Genome-wide association studies can be extended to test all pairwise (or higher-order) combinations of SNPs for interactions. With n SNPs, there are n ²/2 pairwise interactions to test, a massive multiple testing burden that requires enormous sample sizes and sophisticated computational methods. Despite these challenges, GWAS epistasis scans have revealed significant gene-gene interactions for complex traits including height, body mass index, and disease risk in humans. #### QTL Mapping In structured experimental crosses (F2 populations, recombinant inbred lines, MAGIC populations), researchers can map epistatic QTL interactions with greater statistical power than GWAS. The advantage is that linkage disequilibrium is controlled by the experimental design. Studies in model organisms have repeatedly found that epistatic QTL are common and can explain substantial fractions of phenotypic variance that additive QTL miss. #### Mutation Accumulation Experiments Allow mutations to accumulate in replicate lines under minimal selection, then measure fitness in different genetic backgrounds. If the fitness effects of mutations depend on the background they are tested in, epistasis is present. These experiments have provided some of the clearest evidence for widespread epistasis, particularly in microbial systems where large populations and many generations are feasible. #### Theoretical Connections Fisher's geometric model (FGM) predicts specific patterns of epistasis: mutations of large effect should show diminishing returns (negative epistasis), while mutations near an optimum should show increasingly negative interactions. In contrast, Wright's landscape model predicts a mix of positive and negative epistasis depending on the ruggedness of the landscape and where the population sits relative to peaks and valleys. Empirical patterns tend to show more complexity than either model alone predicts. #### Machine Learning Approaches Random forests, gradient boosting, and neural networks can capture non-linear (epistatic) genotype-phenotype relationships without requiring explicit specification of interaction terms. These approaches are increasingly used for phenotype prediction and have shown improved accuracy when epistasis is present. The major challenge is interpretability, a neural network may capture epistatic effects perfectly but tell you nothing about which specific genes are interacting or why. ### Essential Reading ### Why Epistasis Matters #### Speciation Dobzhansky-Muller incompatibilities are, at their core, epistasis. An allele that functions perfectly well in its home genetic background causes problems, reduced fitness, sterility, or inviability, when placed into a hybrid genetic background. This is the genetic basis of reproductive isolation , and therefore speciation. Without epistasis, speciation through the accumulation of genetic incompatibilities would be impossible. #### Disease Many complex diseases involve gene-gene interactions . The "missing heritability" problem, the observation that identified GWAS variants explain only a fraction of the heritability estimated from family studies, may partly reflect undetected epistatic interactions. If the effect of a risk allele depends on variants at other loci, standard additive GWAS approaches will underestimate its contribution. Accounting for epistasis could close the gap between observed and predicted heritability for diseases like diabetes, heart disease, and psychiatric disorders. #### Agriculture Heterosis (hybrid vigor), the phenomenon where F1 hybrids outperform both parents, involves non-additive genetic effects. Whether heterosis is primarily due to dominance (masking of deleterious recessives) or epistasis (favorable interactions between alleles from different parents) remains debated, but epistatic contributions are increasingly recognized. Understanding these interactions is crucial for modern breeding programs that seek to predict hybrid performance from parental genotypes. #### Drug Resistance Combinations of resistance mutations can interact in unexpected ways. Some combinations are more than the sum of their parts (positive epistasis, accelerating resistance evolution), while others are less than the sum (negative epistasis, potentially constraining it). Understanding the epistatic landscape of drug resistance is critical for designing combination therapies and predicting the evolutionary trajectory of pathogens and cancers. #### The Fundamental Question Is evolution primarily a smooth, additive, hill-climbing process (Fisher)? Or is it navigation of a rugged landscape requiring drift, population structure, and gene interaction to find global optima (Wright)? A century of accumulating data increasingly suggests: both views capture important truths, but Wright's emphasis on epistasis was more prescient than the field long acknowledged. The Blackmon Lab's comprehensive analysis of line cross data provides perhaps the strongest evidence yet that epistasis is a fundamental and pervasive feature of genetic architecture, not a statistical curios... ### Epistasis Across Biology Explore our research page to learn more about how the Blackmon Lab studies genetic architecture, epistasis, and the forces shaping genome evolution across the tree of life. #### Explore Further ## A History of Chromosome Number Evolution - Blackmon Lab URL: https://coleoguy.github.io/chromosome-evolution-history.html Description: From counting to modeling, a century of ideas on karyotype change, tracing the intellectual arc of chromosome number evolution research from Stebbins to modern probabilistic models. # Chromosome Number Evolution Few topics in evolutionary biology have cycled so dramatically in and out of fashion as the evolution of chromosome number. Once the centerpiece of speciation theory, then eclipsed by sequence-level genomics, and now experiencing a vigorous quantitative renaissance, the field traces an arc from cytological description to phylogenetic inference, and the scientists who drove each turn could hardly have been more different from one another. ### The Cytological Foundations The field was born from the microscope. In the early twentieth century, botanists and cytologists recognized that chromosome number varied dramatically across species and, crucially, that polyploidy (whole-genome duplication) was both common and associated with speciation. G. Ledyard Stebbins , in his landmark 1950 synthesis Variation and Evolution in Plants , elevated karyotype change to a central mechanism of plant diversification, framing polyploidy and dysploidy (single-chromosome gains and losses) as paths to reproductive isolation. Stebbins' view dominated for decades; polyploidy wa... In animals, the conversation was different and rougher. M.J.D. White , in Animal Cytology and Evolution (1954, revised 1973), documented the astonishing diversity of animal karyotypes and advanced "stasipatric speciation", the controversial proposal that chromosomal rearrangements could drive speciation even without geographic isolation. White's model positioned structural chromosome change as the motor of animal species diversification, a view that generated fierce debate and never reached consensus, but cemented the idea that karyotype mattered beyond plants. The sheer breadth of animal ... ### The Population Genetic Challenge Russell Lande's 1979 theoretical paper in Evolution was a cold-water moment for chromosomal speciation theory. Using population genetic models, Lande showed that underdominant chromosomal rearrangements, which reduce heterozygote fitness, face severe barriers to fixation in large populations, making them poor candidates for common speciation events. This effectively challenged White's stasipatric model on theoretical grounds, and for a time the field retreated. Interest in chromosome-number evolution as a speciation mechanism waned through the 1980s and into the early molecular era, as al... The plant polyploidy literature never collapsed in the same way, sustained by undeniable empirical examples, but even there the question of whether polyploidy accelerated or retarded diversification remained unresolved for decades. ### The Hybrid Speciation Synthesis Loren Rieseberg ( UBC ) revitalized the chromosomal speciation debate in the 1990s–2000s through his extraordinary work on sunflowers. By experimentally recreating natural hybrid species of Helianthus , Rieseberg demonstrated that chromosomal rearrangements accumulate predictably during hybrid speciation and act as barriers to gene flow, not through underdominance alone, but through recombination suppression. His work reframed structural chromosome change as a speciation mechanism that was both common and tractable, winning him a MacArthur Fellowship and reshaping how speciation geneticist... ### The Probabilistic Revolution The modern era of chromosome-number evolution research is largely a story of methods. Itay Mayrose ( Tel Aviv University ) transformed the field by developing likelihood-based probabilistic models of chromosome number change on phylogenies. His ChromEvol framework (2010, 2014) provided the first statistically rigorous way to infer rates of polyploidization and dysploidy across trees, and his provocative 2011 Science paper, showing recently formed polyploids diversify more slowly than diploid relatives, challenged decades of received wisdom about polyploidy as a speciation engine. Rosana Zenil-Ferguson ( University of Kentucky ) extended this program by coupling chromosome evolution to diversification dynamics. Her BiChroM and ChromoSSE models allow chromosome number change to interact with trait evolution and with speciation and extinction rates, and her work on holocentric lineages like Carex revealed that dysploidy itself can be a driver, not merely a marker, of rapid diversification. Zenil-Ferguson represents the cutting edge: state-dependent diversification models that treat the karyotype as dynamic and causally important. ### Open Questions & The Road Ahead Major tensions remain. The relationship between chromosome number change and diversification rate is contested: Mayrose's polyploidy-slows-diversification result has been challenged on methodological grounds, and the role of dysploidy versus polyploidy varies enormously across clades. In animals, systematic phylogenetic comparative work of the kind Mayrose and Zenil-Ferguson built for plants has lagged, the Blackmon Lab's work on beetles and the karyotype databases represent important inroads. Model adequacy, whether our probabilistic models of chromosome change are biologically realistic... ### What Has This Lab Added to the Picture? The Blackmon Lab at Texas A&M has contributed to several fronts of the animal karyotype evolution literature, an area that, relative to plants, has lagged badly in the probabilistic inference era. ### Comparative Databases at Scale One persistent bottleneck for macroevolutionary analyses of animal karyotypes has been the absence of curated, machine-readable data. The lab addressed this directly with the Coleoptera Karyotype Database (Blackmon & Demuth, 2015), one of the largest and most taxonomically comprehensive cytogenetic datasets for any animal order, followed by amphibian (Perkins et al., 2019) and Diptera databases. Together these now span more than 14,000 records and have enabled comparative analyses at scales not previously possible in animals. An autonomous AI-assisted data collection system is actively ex... ### Connecting Traits to Karyotype Rates The chromePlus R package (available on GitHub) extends the ChromEvol framework by allowing a binary trait, sex determination system, mating strategy, presence of B chromosomes, or any other two-state character, to modulate the rates of dysploidy and polyploidy. This answers a question the earlier methods left untouched: does karyotype evolution run faster or slower depending on the genomic or ecological context of a lineage? The package fits these joint models via MCMC within the diversitree ecosystem, returning posterior distributions over rate parameters and enabling formal Bayes factor... ### Theory of Sex Chromosome Fixation While much of the comparative literature describes patterns of sex chromosome diversity, the lab has also contributed theoretical population genetic work on the mechanisms driving those patterns, specifically, the fixation probability of mutations that expand the non-recombining region of a sex chromosome. This work bridges Lande's population genetic framework (which showed how hard it is for underdominant rearrangements to fix) with the empirical observation that sex chromosome differentiation does, somehow, proceed. Understanding the conditions under which fusions and inversions can spre... ### Karyotype Evolution in Beetles as a Model System Coleoptera are the most species-rich order of animals and show extraordinary diversity in both chromosome number and sex chromosome systems, making them an ideal system for testing macroevolutionary hypotheses about karyotype change. The lab's empirical and comparative work in beetles has documented patterns of sex chromosome turnover, dysploidy rates, and the relationship between karyotype diversity and clade diversification in ways that complement the plant-centric literature. The current research questions pursued by the lab, including whether there is an ideal chromosome number and wh... #### Explore Further ## Coleoptera Genomics, Blackmon Lab URL: https://coleoguy.github.io/coleoptera-genomics.html Description: A visual guide to beetle genomics, from reference genomes and chromosomal elements to dosage compensation, population genetics, and the most species-rich order on Earth. # Coleoptera Genomics ### Why Beetle Genomes Matter Coleoptera is the most species-rich order of any organism on Earth. With approximately 400,000 described species , beetles account for roughly 25% of all known animal species and about 40% of all insects. No other order, of any kingdom, comes close. Beetles occupy nearly every terrestrial and freshwater habitat on the planet. They include critical pollinators (scarabs, longhorns), decomposers (burying beetles, dung beetles), agricultural pests (Colorado potato beetle, boll weevil, bark beetles), and forest ecosystem engineers (bark beetles that drive succession in conifer forests). Yet genomic resources for Coleoptera have lagged far behind Diptera (flies) and Lepidoptera (butterflies and moths). For years, Tribolium castaneum (the red flour beetle) was essentially the only high-quality beetle reference genome. This meant that the most species-rich order of life was also one of the least genomically characterized. The genomic revolution in Coleoptera is now underway. New long-read sequencing technologies, PacBio HiFi and Oxford Nanopore, combined with Hi-C chromosome scaffolding have made chromosome-level beetle assemblies achievable for any species, not just model organisms. Understanding beetle genomes matters for multiple reasons: - Pest management, bark beetles, crop pests, and stored-product beetles cause billions of dollars in damage annually. Genomics enables targeted control strategies. - Conservation, endangered beetle species need genomic baselines for population viability and genetic rescue. - Fundamental biology, why are there so many beetle species? What is different about their genomes? - Comparative genomics, beetles span 325+ million years of evolution, providing a massive comparative framework. #### Key Papers ### Stevens Elements, The Chromosomal Building Blocks Just as Drosophila has Muller elements (A–F), beetles have their own set of conserved chromosomal elements. We call these Stevens elements , named after Nettie Stevens , who discovered sex chromosomes in 1905 while studying the mealworm beetle Tenebrio molitor . Stevens elements represent ancestral linkage groups that have been conserved across beetle evolution despite hundreds of millions of years of divergence. Even when chromosome numbers change dramatically through fusions and fissions, the gene content of these elements tends to stay together as syntenic blocks. The Tribolium castaneum genome (2n = 20, or 10 chromosome pairs) provided the initial framework for defining these elements. Comparative mapping to other beetle species reveals remarkable conservation of synteny blocks, even across species with radically different chromosome numbers. #### Why Stevens Elements Matter Understanding these conserved elements is essential for: - Tracing the history of chromosome rearrangements, which fusions and fissions occurred in a lineage - Determining sex chromosome homology, identifying which element became the X or Y in different species - Understanding dosage compensation, are the same genes compensated when different elements become sex-linked? - Reconstructing ancestral karyotypes, what did the beetle ancestor's chromosomes look like? #### Key Papers Note: Data shown are simulated for illustrative purposes and do not represent real synteny analyses. ### Lab-Produced Genomes The Blackmon Lab is actively generating chromosome-level genome assemblies for beetle species chosen to answer specific evolutionary questions. Each genome project targets a species that fills a key gap in our understanding of beetle genome evolution. Our approach combines multiple technologies: - PacBio HiFi sequencing, highly accurate long reads (>99.9% accuracy, 15–20 kb read lengths) for contiguous assembly - Hi-C scaffolding, chromosome conformation capture to order and orient contigs into chromosome-level scaffolds - RNA-seq, transcriptome data for gene annotation and expression analysis Assemblies are evaluated using BUSCO completeness scores (typically >95% for our chromosome-level assemblies) and compared to karyotype data from our comprehensive databases of over 14,000 beetle karyotype records. Each genome below was chosen not just for its biological interest, but for its strategic placement in the beetle tree of life, filling phylogenetic gaps and enabling comparative analyses that would be impossible with genomes from model organisms alone. #### Our Genomes ### Dosage Compensation in Beetles Most beetles have XY sex determination , though some possess more exotic systems: Xyp (parachute sex chromosomes), neo-XY (autosome fused to an ancestral sex chromosome), or X0 (Y chromosome lost entirely). Beetles have more sex chromosome system diversity than almost any other order. Dosage compensation addresses a fundamental problem: males have one X chromosome while females have two. This means X-linked genes are at half dosage in the heterogametic sex, potentially halving the expression of hundreds of genes. #### Solutions Across Life #### What About Beetles? The answer is complex and varies across species. Tribolium castaneum shows incomplete dosage compensation, some X-linked genes are compensated (expression ratio near 1:1 between sexes), while others are not (expression ratio near 0.5 in males). This incomplete pattern may be ancestral for beetles. The Blackmon Lab investigates dosage compensation patterns across beetle species using RNA-seq and coverage-based sex chromosome identification . By comparing male and female read depth across the genome, we can identify sex-linked scaffolds without any prior genetic map. #### Key Papers Note: Data shown are simulated for illustrative purposes and do not represent real coverage analyses. Note: Expression ratios shown are simulated for illustrative purposes and do not represent real RNA-seq data. ### Population Size, Gene Flow, and Beetle Diversity Effective population size (Ne) is one of the most important parameters in evolutionary genetics. It determines the relative power of genetic drift versus natural selection, the rate of adaptive evolution, the level of standing genetic variation, and the probability of fixing new mutations. Beetles span an enormous range of population sizes: from widespread agricultural pests with Ne in the millions (Colorado potato beetle, bark beetles during outbreaks) to narrowly endemic montane species with Ne in the hundreds (flightless ground beetles on isolated sky islands). #### Gene Flow and Dispersal Gene flow connects populations and homogenizes allele frequencies. In beetles, gene flow depends critically on dispersal ability (flight versus flightlessness), habitat connectivity, and life history traits. Flightless beetle species tend to have: - Smaller effective population sizes - Stronger population genetic structure (higher Fst) - Higher rates of allopatric speciation - Potentially faster chromosome evolution, connecting to the drift barrier hypothesis for karyotype change #### Population Genomic Approaches Modern population genomics uses several complementary approaches: - π (nucleotide diversity) estimates Ne × μ - Fst between populations measures genetic differentiation - PSMC / SMC++ reconstructs Ne through time from a single genome - Admixture analysis reveals hybridization and introgression #### Key Papers Note: PSMC curves shown are simulated for illustrative purposes and do not represent real demographic reconstructions. ### Beetle Phylogenomics Resolving the beetle tree of life has been one of the great challenges in systematic biology. With approximately 400,000 species across roughly 200 families, the phylogenetic relationships among major beetle lineages remained contentious for decades. Genomic data has transformed beetle systematics. Transcriptomes and whole genomes provide thousands of loci for phylogenetic inference, resolving relationships that were ambiguous with morphology or single genes. #### The Four Beetle Suborders Coleoptera is divided into four suborders, each with a distinctive body plan and ecology: - Adephaga, ground beetles, tiger beetles, diving beetles (~45,000 species). Predatory, with strong mandibles. - Archostemata, reticulated beetles (~40 species). Ancient relicts, the most basal living beetles. - Myxophaga, tiny aquatic beetles (~100 species). Algae feeders in water films. - Polyphaga, the vast majority (~350,000 species). Includes weevils, scarabs, longhorns, ladybugs, fireflies, and nearly every beetle you have ever seen. #### Key Resolved Relationships Genomic phylogenetics has resolved several long-standing debates: - The placement of weevils (Curculionoidea) deep within Polyphaga - The rapid radiation of phytophagous beetles coinciding with angiosperm diversification - Beetles originated approximately ~325 million years ago (Carboniferous) and survived the end-Permian mass extinction - Most modern beetle families diversified during the Cretaceous terrestrial revolution #### Key Papers ### Comparative Genomics Across Beetles With multiple chromosome-level beetle genomes now available, comparative genomics is revealing the forces shaping beetle genome evolution at unprecedented resolution. #### Genome Size Variation Beetles show substantial genome size variation, from compact genomes of approximately 150 Mb in some small species to bloated genomes exceeding 2 Gb in others. What drives this variation? Three major forces contribute: - Transposable element dynamics, TE expansions can dramatically inflate genome size, while efficient deletion mechanisms keep some genomes compact - DNA deletion rates, species with faster rates of DNA loss tend to have smaller genomes - Polyploidy, rare in beetles but documented in some weevil lineages #### Gene Family Evolution Beetles have expanded and contracted specific gene families relative to other insects. Two families are particularly dynamic: - Olfactory receptors (ORs), expanded in bark beetles that must locate host trees and pheromone sources over long distances - Gustatory receptors (GRs), expanded in phytophagous species that must evaluate host plant chemistry - Cytochrome P450s, detoxification genes expanded in species feeding on chemically defended plants #### Synteny and Chromosome Evolution Despite chromosome number variation ranging from 2n = 4 (some bark beetles) to 2n = 72+ (some longhorns), many synteny blocks are conserved across hundreds of millions of years. Chromosome fusions and fissions reshuffle these blocks without disrupting the genes within them, consistent with the Stevens element framework. #### Repetitive DNA Transposable elements make up a variable fraction of beetle genomes. Some lineages have experienced TE expansions (genome obesity), particularly in species with large genomes and small Ne. Others maintain compact genomes through efficient deletion, especially species with large population sizes where selection against genomic bloat is more effective. #### Key Papers ### The Future of Beetle Genomics Large-scale initiatives are rapidly expanding beetle genomic resources. The Earth BioGenome Project , i5K (5,000 insect genomes), and national genome projects around the world are making beetle genomes available at an accelerating pace. #### Coming Frontiers - Pan-genomes, capturing structural variation within species, moving beyond single reference genomes to understand the full complement of genetic diversity - Population-level resequencing, whole-genome resequencing of hundreds of individuals across hundreds of beetle species, revealing selection pressures and demographic histories - Functional genomics, CRISPR gene editing in non-model beetles, moving from correlation to causation in understanding gene function - Conservation genomics, genetic rescue and monitoring for endangered species, using genomic data to guide management decisions - Metagenomics, beetle-microbiome interactions, particularly the obligate symbioses in bark beetles and grain beetles that enable exploitation of nutritionally poor substrates #### The Big Question Why are there so many beetle species? Is it adaptive radiation into new ecological niches? Key innovations (phytophagy, complete metamorphosis, elytra protecting flight wings)? Geographic opportunity? Or something about their genomes, perhaps high rates of chromosome rearrangement that promote speciation, or gene family expansions that enable rapid adaptation? Comparative genomics across the order may finally answer Haldane's famous (possibly apocryphal) quip about "an inordinate fondness for beetles." The answer is likely multifactorial, and it will require exactly the kind of integrative approach our lab takes, combining karyotype databases, genome assemblies, population genomics, and phylogenetic comparative methods. #### The Blackmon Lab's Contribution Our karyotype databases (14,000+ records for beetles alone), genome assemblies , and population genomic studies provide the foundation for answering these questions at genomic scale. By integrating cytogenetic, genomic, and phylogenetic data, we aim to understand not just how beetle genomes evolve, but why they evolve the way they do. #### Key Papers #### Related Resources ## Epistasis Database - Blackmon Lab URL: https://coleoguy.github.io/epistasis-database.html Description: Over 1,600 datasets from 130+ publications quantifying the role of epistasis in trait divergence across the tree of life. # Epistasis Database Quantifying the role of epistasis in trait divergence across the tree of life Since the 1930s, scientists have debated the importance of epistatic gene action relative to additive gene action in trait divergence. To resolve this debate, we conducted an extensive literature search and generated several datasets, allowing us to quantify the composite genetic effects that underlie trait divergence across the tree of life. Cumulative distribution of the proportion of trait divergence explained by epistasis Each point represents one dataset. Vertices: Additive (top), Dominance (bottom-left), Epistasis (bottom-right). Since the 1930s, scientists have debated the importance of epistatic gene action relative to additive gene action in trait divergence. Previous studies have been limited in the number of datasets for which they can accurately quantify epistatic effects. To resolve this debate, we have conducted an extensive literature search and generated several datasets, allowing us to quantify the composite genetic effects that underlie trait divergence across the tree of life. This database houses over 1,600 datasets from 130+ publications, allowing viewers to visualize the effect of epistasis on a range of organisms and phenotypes. Use the Visualization tab to explore cumulative distributions of epistatic effects with flexible color-coding and subsetting. The Ternary Plot shows the relative contributions of additive, dominance, and epistatic effects for each dataset. The Data Table tab provides a searchable, sortable view of all records, and each underlying dataset can be downloaded individually. Submitting data: If you are aware of any available records that should be added to the database, please email us and we will incorporate the missing data. Usage rights: Data taken from the database must not be reproduced in published lists, online databases, or other formats, nor redistributed without permission. The information in this database is provided solely for personal and academic use, and must not be used for the purposes of financial gain. Current version: 1.0 · Last updated: 14 July 2023 ## Circadian Period (τ) Database, Blackmon Lab URL: https://coleoguy.github.io/tau_database.html Description: Searchable database of free-running circadian period (τ) measurements across the tree of life. Blackmon Lab, Texas A&M University. # Circadian Period ( τ ) Database Free-running period measurements across the tree of life · Resources The free-running circadian period, τ (tau), is the intrinsic period of an organism's biological clock measured in the absence of external time cues such as light-dark cycles. Under constant conditions, typically constant darkness (DD) or constant light (LL), circadian rhythms persist with a period that reflects the endogenous pacemaker. In most organisms τ is close to, but not exactly, 24 hours, and it varies across taxa, life history stages, and genetic backgrounds. This database compiles free-running period measurements from the primary literature spanning bacteria, fungi, plants, protists, and animals. Records include the mean τ , experimental light condition, species identity, genotype, and source citation. Use the Plots tab to explore distributions across taxa and conditions, or the Data tab to search and filter individual records. ### τ Distribution (all records) ### τ by Taxonomic Class ### τ by Light Condition ### Records by Order (top 20) ### Custom Plot Builder ### Data Table ## CUREs Karyotype Database, Blackmon Lab URL: https://coleoguy.github.io/cures-karyotype-database.html Description: Interactive database of 63,542 chromosome number records across 56 clades with source citations, collected through Course-based Undergraduate Research Experiences (CUREs) at Texas A&M University. #### Clade #### Search species #### Records by Clade #### Cite This Database Copeland, M., McConnell, M., Barboza, A., Abraham, H.M., Alfieri, J., Arackal, S., Bernard, C.E., Bryant, K., Cast, S., Chien, S., Clark, E., Cruz, C.E., Diaz, A.Y., Deiterman, O., Girish, R., Harper, K., Hjelmen, C.E., Thompson, M.J., Koehl, R., Koneru, T., Laird, K., Lee, Y., Lopez, V.R., Murphy, M., Perez, N., Schmalz, S., Sylvester, T., and Blackmon, H. (2026). Dismantling Chromosomal Stasis Across the Eukaryotic Tree of Life. bioRxiv 2026.04.14.718287. https://doi.org/10.64898/2026.04.14.718287 ## Phylogenetic Tree Explorer - Blackmon Lab URL: https://coleoguy.github.io/phylo-explorer.html Description: Interactive phylogenetic tree viewer with cladogram, phylogram, and radial layouts. Paste your own Newick string and trait data. ### Phylogenetic Tree Explorer ### Load Newick Tree Upload a tree file or paste a Newick string. Optionally add trait data as CSV (first column = taxon names matching tip labels). ## Publication Network - Blackmon Lab URL: https://coleoguy.github.io/citation-network.html Description: Interactive force-directed graph of Blackmon Lab publications, colored by year and connected by shared co-authorship. ## Phylogenetic Comparative Methods - Blackmon Lab URL: https://coleoguy.github.io/phylo-methods/index.html Description: Interactive guide to phylogenetic comparative methods. Covers tree reading, trait evolution models, ancestral state reconstruction, and diversification. # Phylogenetic Comparative Methods ### Phylogenies Before anything else, you need to know what a phylogeny is and how to read one. Learn to interpret the tree of life, understand branch lengths, and see why topology matters. ### Continuous Traits Body size, chromosome number, metabolic rate. How do we analyze traits that vary on a continuum? Learn Brownian motion, phylogenetic signal, and phylogenetic generalized least squares. ### Discrete Traits Wings vs no wings. XY vs X0 sex determination. How do we model traits that come in distinct states? Learn the Mk model and ancestral state reconstruction. ### Discrete + Continuous What happens when you need both types of data in the same analysis? Learn phylogenetic ANCOVA, threshold models, and how to integrate multiple types of data. ### Birth-Death Simulator Same rates, wildly different trees. Run paired birth-death simulations and see how stochastic macroevolution produces unpredictable diversity, even when underlying rates are identical. ## Phylogenies - Phylogenetic Comparative Methods URL: https://coleoguy.github.io/phylo-methods/phylogenies.html Description: What is a phylogeny? Interactive D3.js visualization exploring tree reading, cladograms vs phylograms, branch lengths, and tree rotation. # Phylogenies ### What is a phylogeny? A phylogeny is a branching diagram showing the evolutionary relationships among species. Each tip represents a living (or extinct) species. Each internal node represents a common ancestor. The branches connecting them represent time or evolutionary change. Think of the phylogeny as a family tree of life. Just as your grandparents had children who had children, species have common ancestors. The phylogeny is a hypothesis about how all these ancestors and descendants are related. Below is an interactive phylogeny of 15 vertebrate species. Hover over the tips to see the species highlighted, along with its entire ancestral lineage. ### Reading a phylogeny A key insight: the horizontal position of species in a phylogeny does not matter. You can rotate branches around any internal node and get a topology that is logically identical. Below are three different "rotations" of the same six-species tree. They look different, but they show the same relationships. What matters in a phylogeny is which species are nested within which clades . The visual layout is irrelevant. A and B can be on the left or right, at the top or bottom. What matters is that they share a most recent common ancestor that is not shared with C, D, E, or F. ### Branch lengths In some phylogenies, the length of branches encodes information about evolutionary time or the amount of change. A long branch means more time has passed, or more genetic change has accumulated. A short branch means the lineages diverged recently. This distinction is crucial. A cladogram shows topology but ignores branch lengths. A chronogram or phylogram uses branch length to encode meaningful information. In the chronogram above, the horizontal axis represents time. The branch leading to Fish is the longest because it diverged from the other vertebrates around 500 million years ago. Humans and Chimpanzees share a much more recent common ancestor, so the branches separating them are shorter. ### Why phylogenies matter for statistics Here is a critical problem that motivates phylogenetic comparative methods: if you study 1000 species of beetles, you do not have 1000 independent observations. If 500 of those species are sisters (they share a most recent common ancestor that lived 2 million years ago), then those 500 species are not independent. They inherited many traits from their common ancestor. This is the problem of phylogenetic non-independence . When you analyze traits across species, you must account for the fact that species are related by descent. Otherwise, you violate the assumption of independence that underlies standard statistical tests. Imagine you want to test whether body size predicts home range across 100 carnivore species. If you use linear regression, you assume the 100 data points are independent. But suppose 50 species are recently evolved lion relatives. They inherited similar body sizes and home ranges from their common ancestor. Now your sample size is effectively much smaller than 100. The phylogeny gives you the structure to correct for this non-independence. By accounting for the shared ancestry among species, phylogenetic comparative methods let you extract the evolutionary signal from the phylogenetic noise. ### How phylogenies are built Phylogenies are not observed directly. They are inferred from genetic data (DNA sequences), morphological characters (body shape, skeletal features), or a combination of both. The most common method is maximum likelihood (ML) , which finds the tree that makes your data most probable under a statistical model of evolution. Another popular approach is Bayesian inference , which uses Bayes' theorem to calculate a probability distribution over all possible trees. Both methods account for uncertainty in the tree topology. For the purposes of comparative methods, the key insight is this: the phylogeny you use is an estimate, not a fact. It comes with uncertainty. Some internal branches may be poorly supported. Good practice includes checking the support values (bootstrap percentages in ML, posterior probabilities in Bayesian methods) and sometimes using methods that account for topological uncertainty. ## Continuous Trait Evolution - Phylogenetic Comparative Methods URL: https://coleoguy.github.io/phylo-methods/continuous.html Description: Brownian motion, phylogenetic signal, and PGLS regression. Interactive simulations for understanding continuous trait evolution on phylogenies. # Continuous Trait Evolution ### What is a continuous trait? Continuous traits are characters that vary along a spectrum. Body size, brain mass, metabolic rate, chromosome number, genome size. These are not discrete categories. A species can have a body size of 2.3 kg, or 4.7 kg, or any value in between. When you measure a continuous trait across species, you get a dataset of real numbers. The question in phylogenetic comparative methods is: how much of the variation in this trait is explained by shared ancestry, and how much is explained by independent evolution in different lineages? ### The problem with naive regression Suppose you measure body size and home range for 50 species of carnivores. You want to know: do larger animals have larger home ranges? A simple linear regression seems like the obvious approach. The problem: your 50 species are not independent data points. If 20 of them are closely related lions, they inherited similar body sizes and home ranges from their common ancestor. Your regression treats each as an independent observation, inflating the sample size and giving you false confidence in your result. This is the essence of the phylogenetic comparative problem. We need methods that acknowledge the tree structure. ### Brownian Motion The foundation for most continuous trait analysis is the Brownian motion model . This is a random walk: at each instant in time, a trait changes by a small, random amount. The change is drawn from a normal distribution with mean 0 and variance sigma-squared (the rate of evolution). Under Brownian motion, the variance of a trait at the tips of the tree is proportional to time. A longer branch accumulates more variance. A shorter branch accumulates less. This is exactly what we expect under random evolution. ### Phylogenetic signal Not all traits evolve under Brownian motion at the same rate. Some traits show strong phylogenetic signal : closely related species are very similar, and the trait is predictable from phylogeny. Other traits show weak signal: the trait is distributed randomly across the tree, with no obvious relationship to phylogeny. The most common way to quantify signal is Blomberg's K . Under Brownian motion, K = 1. If K > 1, the trait is more clustered by phylogeny than expected under BM (strong signal). If K < 1, the trait is less clustered (weak signal, more convergence or recent change). ### Phylogenetic Generalized Least Squares (PGLS) When you regress one continuous trait against another (e.g., body size vs. home range), you want to account for the phylogenetic non-independence. This is what PGLS does. PGLS is a regression method that uses the expected covariance matrix from the phylogeny to weight the data. Under Brownian motion, species that are closer in the tree are expected to be more similar. PGLS exploits this to give you a regression line that reflects the true evolutionary relationships, not just the raw tip data. ### What you actually do in R In practice, you use the gls function from the nlme package with a correlation structure from ape : The corBrownian function encodes the phylogenetic covariance matrix. The gls function (generalized least squares) uses this matrix to compute regression coefficients that properly account for phylogenetic non-independence. ## Discrete Trait Evolution - Phylogenetic Comparative Methods URL: https://coleoguy.github.io/phylo-methods/discrete.html Description: Mk model, ancestral state reconstruction, stochastic character mapping, and chromePlus. Interactive guide to discrete trait evolution on phylogenies. # Discrete Trait Evolution ### Discrete traits in biology Not all traits vary continuously. Many important characters come in discrete states: a species either has wings or does not, has an XY sex determination system or an X0 system, is herbivorous or carnivorous, or lives on land or in water. Analyzing these traits requires different models. You cannot use Brownian motion, which assumes continuous change along a random walk. Instead, you need a Markov chain model that describes transitions between discrete states. ### The Mk model The Mk model (for "Markov model of k states") is the standard framework for discrete trait evolution. Imagine a trait with two states: State A and State B. At any point in time, a lineage can change from A to B, or from B to A, according to certain rates. These rates are encoded in a matrix called Q. For a two-state system, Q has four entries: q01 (rate of change from A to B) and q10 (rate from B to A). Under a Markov process, the probability of being in each state evolves according to these rates. ### Ancestral state reconstruction A key question in evolutionary biology: what state did the common ancestor have? If you observe state A in one species and state B in another sister species, did their ancestor have state A, state B, or are both equally plausible? Ancestral state reconstruction (ASR) uses maximum likelihood to estimate the probability of each state at each internal node of the tree. The reconstruction accounts for the tip states and the model of evolution. ### Stochastic character mapping ASR gives you point estimates (maximum likelihood) of ancestral states. But there is uncertainty. The actual history of character evolution could have taken many different paths, all consistent with the tip data. Stochastic character mapping samples many possible histories from the posterior distribution, conditional on the data and the model. You can then describe what happened: how many transitions occurred, on which branches, and in which direction. Both of these histories are consistent with the observed data. Stochastic character mapping samples many such histories, weighted by their likelihood under the model. This lets you estimate the expected number of transitions and their locations on the tree. ### Testing correlations between discrete traits Suppose you want to know: does having wings correlate with flying behavior? Or does having an XY sex determination system correlate with having sex-limited traits? Pagel's test is a phylogenetic comparative method that tests whether two discrete traits evolve independently or are correlated. The test compares two models: one where the traits evolve independently (simpler), and one where they can influence each other's evolution (more complex). A significant result suggests the traits are correlated in their evolution. ### What you actually do in R The ape package has the ace function for maximum likelihood ancestral state reconstruction. The phytools package has more advanced functions including stochastic character mapping: ### Chromosome number as a discrete trait: chromePlus Chromosome number is one of the most dynamic features of eukaryotic genomes. Across species, chromosomes fuse, split, and duplicate, sometimes driving reproductive isolation and speciation. Modeling how chromosome number evolves requires treating it as a special kind of discrete trait where the state space is large (chromosome numbers from 1 to 100+) and the transitions have biological structure: gains of one, losses of one, and whole-genome duplications (polyploidy) all have distinct rates. For a history of how scientists developed these ideas, from Stebbins and White to Mayrose and Zeni... chromePlus is an R package developed in this lab that extends the chromEvol framework within the diversitree ecosystem. It fits Markov chain models of chromosome evolution on phylogenies using MCMC, and critically, allows a binary trait (such as sex determination system, mating strategy, or presence of B chromosomes) to influence the rates of chromosome change. This lets you ask: does having sex chromosomes speed up or slow down karyotype evolution? The model tracks a state space combining chromosome number and binary trait. From any state, five types of events can occur: Each of these rates can be conditioned on the binary trait state, so you can model a world where sex-chromosome-bearing lineages (state B) have faster dysploidy rates than those without (state A). A polyploidy event in the ancestor of the gold clade doubled the chromosome number. dysploidy then reduced n=18 → n=17 in one lineage. In R, chromePlus wraps MCMC inference around these models, returning posterior distributions over rate parameters. You can compare models (e.g., rates equal vs rates free between binary trait states) using Bayes factors from the MCMC output. ### Maximum likelihood vs Bayesian inference All of the methods on this page require fitting a model to data, estimating rate parameters like q01 and q10. There are two major philosophies for doing this, and both appear in phylogenetic software. Understanding the difference matters because they answer subtly different questions and have very different computational demands. The likelihood function L(θ | data) describes how probable the observed data are for each possible value of the parameter θ. Maximum likelihood finds the single value of θ that makes the data most probable, the peak of this surface. This is computationally fast because you only need to find one point. Tools like ape::ace(method="ML") and corHMM use numerical optimization (gradient descent or Nelder–Mead) to locate the peak quickly, even for large trees. The cost: you get a point estimate, not a distribution. Standard errors can be approximated from the curvature of the likelihood surface, but this assumes the surface is roughly parabolic near the peak, an assumption that can break down with sparse data. Bayes' theorem says: posterior ∝ likelihood × prior . Rather than finding the single best θ, Bayesian inference characterizes the full posterior distribution over θ, the probability of each parameter value given the data and your prior beliefs. This is elegant because it propagates uncertainty naturally. You don't just get a best estimate, you get a credible interval that directly means "there is 95% posterior probability the parameter lies here." Stochastic character mapping is inherently Bayesian for exactly this reason. The cost: the posterior is rarely tractable analytically. In practice, MCMC (Markov chain Monte Carlo) is used to sample from it, which requires running a chain for tens of thousands of iterations. For complex models on large trees, this can take hours or days. Adjust the sliders to see how the likelihood surface and prior shape the posterior distribution. The ML estimate is the peak of the likelihood curve; the Bayesian posterior is the product of likelihood × prior (renormalized). In practice: use ML when you need fast answers or are running many model comparisons (e.g., testing 10 different rate matrices). Use Bayesian methods (MCMC) when you need full uncertainty quantification, want to formally incorporate prior information, or are working with models too complex for numerical optimization (like chromePlus, or joint inference of rates and ancestral states). There is also a practical numerical reason to prefer MCMC: discrete trait models can have flat likelihood ridges at unrealistically high rate values , where many parameter combinations fit the data nearly equal... ## Discrete + Continuous - Phylogenetic Comparative Methods URL: https://coleoguy.github.io/phylo-methods/discrete-continuous.html Description: Pagel's lambda and PGLS-ANCOVA for mixed trait models. Interactive phylogenetic comparative methods guide. # Discrete + Continuous ### Why you need both Many evolutionary questions require analyzing both discrete and continuous traits simultaneously. For example: does having an XY sex determination system correlate with chromosome number? Do species with wings have larger body sizes? Does the presence of a trait relate to the rate at which another trait evolves? These are inherently multivariate problems. You cannot analyze them by looking at each trait independently and ignoring the phylogenetic structure. You need methods that can handle mixed data types while accounting for shared ancestry. ### PGLS with a discrete predictor The simplest case is when your predictor is discrete (e.g., "has trait X" or "doesn't have trait X") and your response is continuous (e.g., body size). This is essentially a phylogenetic ANCOVA (analysis of covariance). Below is an interactive plot. Start with the naive regression (OLS), which ignores the tree. Then reveal the phylogeny. Notice how the phylogenetic structure influences the relationship. Finally, show the PGLS line, which corrects for non-independence. ### Phylogenetic ANCOVA: comparing groups When you test whether a continuous trait differs between two groups (accounting for phylogeny), you are doing a phylogenetic ANCOVA. The comparison is not a simple t-test. Instead, you fit separate regression lines for each group and test whether they differ in intercept, slope, or both. Treats groups as independent. Ignores phylogenetic relationships within groups. Uses phylogenetic covariance matrix. Accounts for non-independence within groups. ### Pagel's lambda and phylogenetic correlation In the examples above, we have assumed Brownian motion evolution. But sometimes traits do not show strong phylogenetic signal. One way to model this is with Pagel's lambda , a scaling parameter that ranges from 0 (no phylogenetic signal) to 1 (strong signal under BM). Lambda transforms the phylogenetic covariance matrix by scaling off-diagonal elements. At lambda=0, all species are equally distinct. At lambda=1, you recover the standard BM expectation. Below is a heatmap showing how the correlations between species change as lambda varies. Lambda = 0 (left) indicates no phylogenetic signal. Lambda = 1 (right) indicates Brownian motion expectation. Brighter colors = stronger correlations between species. ### The threshold model A conceptually important model bridges the gap between discrete and continuous evolution. The threshold model (Felsenstein 2012) proposes that a discrete trait (e.g., the presence or absence of a feature) is determined by an underlying continuous "liability" variable. When liability exceeds a threshold, the discrete trait is expressed. This model is powerful because it unifies discrete and continuous thinking. The liability evolves continuously (under Brownian motion), but we only observe the discrete outcome. This means that even if a discrete trait appears to have no phylogenetic signal, there could be substantial signal in the underlying liability. ### Putting it all together: a Blackmon lab example The Blackmon lab studies sex determination systems and sex chromosomes in insects. A great example of a discrete-continuous question: does having an XY sex determination system (discrete) correlate with having more chromosomes (continuous)? The answer, from the lab's research, is yes. Species with XY systems tend to have higher chromosome numbers. But this relationship is not straightforward. Different insect lineages show different patterns. Some evolved XY systems independently. Some lost them. Some changed chromosome numbers while keeping the same sex determination system. Analyzing this requires PGLS with a categorical predictor, accounting for the fact that "XY vs non-XY" transitions have happened multiple times in the phylogeny, and each transition might be associated with changes in chromosome number. Example: Do insects with XY sex determination have different chromosome numbers than species with other sex systems? The figure shows a hypothetical example. XY species (gold) have higher chromosome numbers than non-XY species (green). A PGLS analysis would test whether this difference is statistically significant after accounting for phylogenetic non-independence. It would estimate separate intercepts for each group and test whether the slopes differ. ### What you actually do in R PGLS with a categorical predictor uses the same gls function as before, but with a factor variable as the predictor: For the threshold model, packages like phytools and geiger provide functions for fitting liability distributions and thresholds. The analysis is more complex, but the idea is to estimate the underlying continuous liability that gives rise to discrete states. ## Birth-Death Tree Simulator - Blackmon Lab URL: https://coleoguy.github.io/phylo-methods/bd-simulator.html Description: Interactive birth-death process simulator showing how identical evolutionary rates produce dramatically different phylogenetic trees by chance alone. # Birth-Death Tree Simulator ### What are you looking at? Each panel simulates a birth-death process starting from a single lineage. At any moment, each living lineage can speciate (split into two daughters, rate λ) or go extinct (rate μ). The two simulations use identical parameter values, but because speciation and extinction are stochastic events , the resulting trees are completely different every time. Gold branches are lineages that survive to the present. Grey branches went extinct before the end time. Red dots mark extinction events. This variability is why two clades evolving at the same rates can end up with very different numbers of species, and why detecting shifts in diversification rate requires careful statistical methods, a lot of the apparent signal can be noise. Try increasing μ toward λ (high turnover) and watch how often complete extinction occurs. Try low μ and high λ for explosive radiations. The net diversification rate is r = λ − μ . When r is small or negative, trees frequently go extinct entirely, even with a positive expected trajectory, any single realization can die out by chance (the stochastic extinction problem ). ## Likelihood Ridges in Discrete Trait Models - Phylogenetic Comparative Methods URL: https://coleoguy.github.io/phylo-methods/likelihood-ridges.html Description: Understanding likelihood ridges in discrete trait Mk models. Why high-rate parameter estimates can mislead maximum likelihood optimization. # Likelihood Ridges in Discrete Trait Models ### The Core Problem Maximum likelihood estimation of discrete trait models comes with a well-known identifiability problem. When transition rates become very high relative to tree depth, something surprising happens: trait states get "scrambled" across the phylogeny. Every lineage rapidly visits every state many times, and the phylogenetic signal gets washed out. In this limit, the expected distribution of tip states converges on the stationary distribution of the Markov chain. This distribution is determined only by the ratio of rates, not their absolute magnitude. Consider a simple example: a model with q 01 = 0.1 and q 10 = 0.1 (slow rates) can produce nearly the same distribution of tip states as q 01 = 100 and q 10 = 100 (very fast rates), if the tip data show a roughly 50/50 split between states. This creates a likelihood ridge : a long plateau in parameter space extending from biologically realistic rate values (say, 0.001–1 events per unit branch length) out to arbitrarily high values. Along this ridge, the likelihood changes only minimally. The ridge runs along the direction of proportional rate scaling ,both rates increase together while maintaining their ratio. ### Why This Matters - Unrealistic estimates: Maximum likelihood optimizers (like optim() in R) can wander up the ridge and return estimates of q = 50 or q = 500 events per unit branch length when the true value is 0.5. Both fit the data similarly well, but only one is biologically plausible. - Slow convergence: The optimizer can get stuck in the ridge region, requiring many more function evaluations to converge. This wastes computational time and can trigger premature termination warnings. - Unbounded confidence intervals: Confidence intervals estimated via likelihood profiles will extend far up the ridge, with the upper bound appearing infinite or unreasonably large. - Tip states are close to a 50/50 distribution (maximum ridging for two-state models) - Trees are large (more tips = stronger constraint on estimates) - There is substantial phylogenetic signal (related species share states, consistency across the tree) ### Interactive Likelihood Ridge Visualization The plot below shows a simulated likelihood surface for a simple two-state equal-rates model (q₀₁ = q₁₀ = q). The x-axis is the rate q on a logarithmic scale; the y-axis is log-likelihood. Use the slider to change the proportion of tips in state 0, and watch how the likelihood ridge forms and shifts. ### Solutions to Ridge Wandering #### 1. MCMC with Priors Bayesian MCMC with an exponential or lognormal prior on rates is the most principled solution. A prior naturally penalizes high values that have little data support. The posterior distribution will concentrate near biologically reasonable values unless the data strongly support high rates. This is why tools like chromePlus and ChromEvol offer MCMC alongside maximum likelihood. The prior acts as a regularizer, pulling estimates away from the ridge toward plausible values. For example, an exponential prior with mean 0.5 makes q = 100 orders of magnitude less probable than q = 0.5, even if both fit the likelihood equally well. #### 2. Multiple Starting Points Running ML optimization from many random starting points samples different regions of parameter space. If most convergences cluster near low-rate solutions but a few wander up the ridge, the true MLE is likely in the low-rate cluster. Compare the converged values: are they stable across runs? If estimates vary wildly, you may be seeing ridge effects. #### 3. Penalized Likelihood Adding a regularization term (e.g., an L2 penalty proportional to rate magnitude) creates a unique maximum that concentrates near realistic values. The penalized log-likelihood is: The penalty weight λ controls how much you trust the prior belief that rates should be small. Cross-validation or information criteria can guide the choice of λ. This is less formal than a full Bayesian approach but faster and often sufficient. #### 4. Rate Bounds The simplest (if somewhat arbitrary) approach is to set upper bounds on the rate parameter during optimization. For example, you might constrain q ≤ 10 events per unit branch length. This prevents the optimizer from ever reaching the ridge. The downside is that the bound is ad hoc and can artificially truncate the likelihood if the true value happens to exceed it. Still, if you have strong biological priors on reasonable rate ranges, this is practical. ### chromePlus and ChromEvol: Best Practices The chromePlus package (developed by the Blackmon Lab) and ChromEvol both allow MCMC inference with user-specified priors precisely because of the ridge problem. When rates inferred by maximum likelihood seem unreasonably high (e.g., > 1 event per unit branch length), this is a diagnostic sign of ridge wandering. To protect yourself: - Always check biological plausibility. Does a rate of 50 transitions per unit branch length make sense given what you know about your organism and trait? - Compare ML and MCMC estimates. If MCMC (with a reasonable prior) gives q = 0.3 but ML gives q = 50, ridge wandering is likely. The MCMC posterior is more trustworthy in this case. - Try multiple starting values. Run the optimizer from 10–50 random starting points. If they converge to a tight cluster, you have confidence. If they scatter wildly, investigate further. - Visualize the likelihood profile. Plot likelihood against rate for your real data. If you see a long, flat plateau, you are on the ridge. A sharp peak indicates identifiability. See the discrete trait evolution guide for more on model selection and the Mk framework. For the history of chromosome evolution methods , including the development of these tools, consult the methods review. Explore karyotype databases for real data examples. ### Summary Likelihood ridges are an inherent feature of discrete trait inference, not a bug but a signature of model identifiability. When tip states are close to equilibrium proportions, high rates become nearly indistinguishable from low rates. This is a genuine inference problem: the data alone may not constrain rates well. The solutions,priors, multiple starts, penalization, and bounds,all aim to break the ridge by adding external information or constraints. In practice, Bayesian MCMC with informative priors is the gold standard because it explicitly encodes what you know (or want to assume) about plausible rate ranges. When in doubt, compare results across methods and always sanity-check your rate estimates.