RNA-seq

Current understanding

RNA-seq spans a wide methodological range, from short Illumina reads that count transcripts with high throughput to long noisy reads from Oxford Nanopore Technology (ONT) or PacBio SMRT platforms that can resolve full-length isoforms and splice junctions in a single pass. The choice of aligner matters substantially, especially for long-read data where base-level error rates are high and read lengths cross multiple exon boundaries.

For spliced alignment of long, noisy reads, minimap2 stands out on real data. On a mouse cDNA dataset sequenced with R9.4 ONT chemistry, minimap2 achieves 94.0% exact intron accuracy, compared with 83.8% for GMAP and 87.9% for SpAln — while running more than 40× faster than either alternative. That combination of accuracy and speed makes it the practical default for long-read RNA-seq workflows aimed at characterizing gene structure or cataloguing alternative splicing. See 10.1093/bioinformatics/bty191, Finding 1.

Whether these advantages extend cleanly to SMRT reads or to future ONT chemistries with lower raw error rates is not yet settled. GMAP and SpAln were designed for low-error inputs, so their performance gap against minimap2 may narrow as read quality improves — or widen as read lengths increase.

Supporting evidence

Contradictions / open disagreements

The minimap2 benchmark is limited to a single species and a single ONT chemistry. GMAP and SpAln were not run with parameters tuned for high-error reads, so the reported accuracy gap may overstate minimap2’s advantage in a fair comparison. Performance on SMRT data or on newer, higher-accuracy ONT chemistries remains an open question.

Tealc’s citation-neighborhood suggestions

Question copied. Paste it into the NotebookLM tab.