NeurIPS 2025 · Peer-Reviewed

Equi-mRNA Foundation Model

The first codon-level equivariant mRNA language model. SO(2) symmetry-aware architecture that outperforms traditional models on expression prediction, stability assessment, and sequence generation.

~10%
Accuracy gain across 6 benchmarks
4.3x
Generative fidelity improvement (FBD)
~28%
Better functional property preservation
Trained on25M sequences
15M params beats82M-param models
Validated on6 biological benchmarks
Why Now

The Timing Is Right

AI Co-Scientist Validated

Google proved the model works — but went horizontal. We go vertical on mRNA where depth matters most.

mRNA Beyond Vaccines

The pipeline is exploding into protein replacement, gene editing, and personalized cancer medicine — each needing better optimization.

Foundation Models at Scale

For the first time, language models can learn biological language at scale. Equi-mRNA adds the missing physics — codon symmetry.

The Architecture

SO(2) Equivariant Codon Encoding

mRNA codons encoding the same amino acid are synonymous — they share function but differ in sequence. Traditional models ignore this. Equi-mRNA explicitly encodes these symmetries as cyclic subgroups of SO(2).

By integrating group-theoretic priors with an auxiliary equivariance loss and symmetry-aware pooling, the model learns biologically grounded representations that capture what matters: function, not just sequence.

Validated: the model learned real biology

Learned codon rotations correlate with GC-content biases (r=0.98, R²=0.97) and tRNA abundance patterns (ρ=−0.69) — confirming the model captures known biological constraints, not just statistical patterns.

~10%
Accuracy gain across expression, stability, and riboswitch benchmarks
4.3x
More realistic mRNA constructs (Frechet BioDistance)
15M
params beats RNA-FM, RNABERT (82M), CodonBert (82M), and HELM (50M)
25M
protein-coding sequences from 56M RefSeq entries for training

How Equi-mRNA Works

1
Codon-Level Tokenization
Encodes mRNA at codon level (3-nucleotide units)
2
Synonymous Symmetry Encoding
Maps codon degeneracy to SO(2) cyclic subgroups
3
Equivariance Loss
Auxiliary loss ensures symmetry preservation during training
4
Symmetry-Aware Pooling
Aggregates representations respecting biological structure
Applications
Expression PredictionStability AssessmentmRNA GenerationTherapeutics Design
The Problem

The Codon Optimization Crisis

The mRNA therapeutics market is projected to grow from $6B to $20.4B by 2032. But expansion beyond vaccines faces a fundamental computational bottleneck.

10632
possible mRNA sequences for a single protein
90%
of clinical trials fail
0
existing tools exploit the genetic code's group structure

Pain Points Equi-mRNA Solves

1
Multi-objective optimization
Stability vs. expression vs. safety — current methods can't optimize all three simultaneously
2
Ribosomal frameshifting
Modified bases cause +1 frameshifting at "slippery sequences" — a safety-critical codon selection problem
3
Generalization failure
Deep learning models trained on reporter constructs don't generalize to real-world mRNAs
4
Stability-expression tradeoff
Best sequences have neither highest CAI nor lowest MFE — an intermediate sweet spot that requires principled navigation
Competitive Landscape

Why Equi-mRNA Is Different

All existing approaches treat codon selection as a discrete combinatorial, sequence generation, or frequency lookup problem. None exploit the mathematical group structure of the genetic code itself.

ApproachMethodLimitationEqui-mRNA Advantage
LinearDesign (Baidu)Lattice parsing from linguisticsOptimizes only MFE + CAI; no frameshifting awarenessMulti-objective with safety constraints
GEMORNA (Raina Bio)Transformer-based generative modelBlack box; requires retraining per applicationInterpretable, symmetry-grounded
RNop (2025)Deep learning with 4 loss functionsLimited by training distribution (3M sequences)Physics-informed inductive bias
GenSmart (GenScript)Frequency tables onlyNo structure awarenessHolistic multi-property optimization
From Model to Platform

How Agents Use Equi-mRNA

Equi-mRNA is the foundation. Helixir AI wraps it in specialized agents that design, predict, and explain — with every result traced back to its source.

Design Agents

Generate optimized mRNA sequences using Equi-mRNA predictions for codon optimization, UTR design, and stability.

Hypothesis · Design · Stability · Ranking

Critique Agents

Evaluate candidates across binding, stability, and immunogenicity. A judge agent forces explicit tradeoffs.

Critique · Tradeoffs · Iteration · QC

Provenance Engine

Every prediction traced to source data. No hallucinations — just grounded, explainable reasoning. This is what sets us apart from Google AI Co-Scientist.

Audit Trail · Explainability · Grounding

AI drafts, analyzes, recommends · Scientists review, refine, decide · Shared context across all agents

Build on Equi-mRNA

Try the Helixir AI platform or partner with us on a custom mRNA design program.