UCE phylogenetics

What it does. Ultraconserved elements (UCEs) are highly conserved genomic regions shared across distantly related taxa that can be captured using sequence-specific bait sets and sequenced via target-enrichment. The phyluce pipeline (Faircloth 2016) handles the full workflow from read trimming through locus alignment and supermatrix or coalescent-based tree estimation. UCE datasets routinely yield hundreds to thousands of parsimony-informative loci per taxon, making them suitable for resolving both shallow and deep phylogenetic splits. In the lab, UCEs have been used to place newly assembled insect genomes into broad phylogenomic frameworks.

When to use it.

When NOT to use it.

Worked example

The snippet below covers the core steps using phyluce from raw trimmed reads. Assumes conda environment phyluce is active.

# ---- 1. Trim reads (illumiprocessor wraps trimmomatic) ----
illumiprocessor \
    --input raw-reads/ \
    --output trimmed-reads/ \
    --config illumiprocessor.conf \
    --cores 8

# ---- 2. Assemble per-sample (trinity or spades) ----
phyluce_assembly_assemblo_spades \
    --conf assembly.conf \
    --output spades-assemblies/ \
    --cores 8

# ---- 3. Find UCE loci in assemblies ----
# probes.fasta: UCE probe set for your taxon group (e.g. Hymenoptera 2.5k)
phyluce_assembly_match_contigs_to_probes \
    --contigs spades-assemblies/contigs \
    --probes probes.fasta \
    --output uce-search-results

# ---- 4. Extract UCE loci shared across a minimum number of taxa ----
# taxon-set.conf: lists the taxa to include
phyluce_assembly_get_match_counts \
    --locus-db uce-search-results/probe.matches.sqlite \
    --taxon-list-config taxon-set.conf \
    --taxon-group "all" \
    --incomplete-matrix \
    --output taxon-sets/

phyluce_assembly_get_fastas_from_match_counts \
    --contigs spades-assemblies/contigs \
    --locus-db uce-search-results/probe.matches.sqlite \
    --match-count-output taxon-sets/all.conf \
    --output taxon-sets/all.fasta \
    --incomplete-matrix \
    --log-path log

# ---- 5. Align UCE loci (mafft + edge-trimming) ----
phyluce_align_seqcap_align \
    --input taxon-sets/all.fasta \
    --output mafft-nexus-edge-trimmed \
    --taxa 119 \
    --aligner mafft \
    --cores 8 \
    --incomplete-matrix \
    --output-format fasta \
    --log-path log

# ---- 6. Filter loci by minimum taxon occupancy ----
phyluce_align_get_only_loci_with_min_taxa \
    --alignments mafft-nexus-edge-trimmed \
    --taxa 119 \
    --percent 0.75 \
    --output mafft-nexus-min75 \
    --log-path log

# ---- 7. Build supermatrix and run IQ-TREE ----
phyluce_align_concatenate_alignments \
    --alignments mafft-nexus-min75 \
    --output supermatrix \
    --phylip

iqtree -s supermatrix/supermatrix.phylip \
       -bb 1000 \
       -nt AUTO \
       -pre supermatrix/iqtree

Gotchas we’ve hit

Key papers that use this method in the lab

Question copied. Paste it into the NotebookLM tab.