Motivation
Genomic foundation models are usually trained with masked language modelling (MLM) or next-token prediction (NTP). Both objectives optimise local nucleotide statistics — what base is likely given its neighbours — and can therefore underrepresent the global, long-range regulatory structure that governs gene function. Retraining a large genomic backbone from scratch to fix this is expensive. JEPA-DNA (Larey et al., 2026) asks instead whether a global, higher-order signal can be injected into an existing model without changing its architecture, by adding a latent-prediction objective on top of the familiar token-level loss.
How it works
JEPA-DNA is a model-agnostic, continual-training wrapper around a pretrained genomic model.
- A context encoder embeds a visible view of the sequence; a predictor maps that context to the latent representation of a target view.
- An EMA target encoder embeds the target view to supply stop-gradient targets, defined over global sequence embeddings rather than per-token outputs.
- Simultaneously the original token-level MLM/NTP loss is retained.
The masking unit is genomic spans — regulatory regions, motifs, or sequence segments. The latent objective grounds the backbone in higher-order organisation while the token loss keeps it anchored to nucleotide identity, making this a hybrid generative-plus-latent scheme.
The objective
The total loss combines the existing token-level term with a JEPA latent term over global embeddings:
$$\mathcal{L} = \mathcal{L}_{\text{MLM/NTP}} + \lambda\,\big\lVert\, g_\phi(z_{\text{ctx}}) - \operatorname{sg}[\bar f_\theta(x_{\text{tgt}})]\,\big\rVert_2^2,$$
where $\operatorname{sg}$ is stop-gradient and $\bar f_\theta$ is the EMA target encoder. The token term anchors the model and prevents the latent objective from collapsing; the latent term grounds the global representation. Because the framework is continual and model-agnostic, it upgrades incumbent backbones in place.
Key results & what's novel
JEPA-DNA reports consistent improvements across 17 genomic benchmark tasks under both linear-probe and zero-shot evaluation. Because gains appear in linear probing and zero-shot — settings that read off representation quality directly — the result indicates better-structured, more transferable embeddings rather than merely a better-tuned head. The conceptual novelty is the recipe: rather than choosing between a token-level objective and a latent one, it shows the two can be combined as a continual, architecture-agnostic upgrade that grounds a genomic model in global structure without retraining from scratch.
Strengths & limitations
- + Model-agnostic and continual — upgrades existing backbones in place, no architecture changes.
- + Consistent gains over 17 tasks in linear-probe and zero-shot settings.
- + The token loss prevents the latent objective from collapsing.
- − Adds a second objective and an EMA encoder, increasing training cost and a loss-weight hyperparameter to balance.
- − Defining informative global target views and span masks is non-trivial.
- − Gains are characterised on benchmark suites; behaviour on very long-range genomic reasoning is still to be probed.