Klindt, LeCun & Balestriero give the cleanest theoretical statement so far of when a JEPA stops being a representation learner and starts being a world model.

The result

Under (i) independent Gaussian latent variables, (ii) stationary additive-noise transitions, and (iii) successful Gaussian regularization (SIGReg, from LeJEPA), a JEPA trained with alignment recovers the world's latent variables linearly, up to rotation. Latent prediction is enough to identify the generative factors — and the paper connects this to optimal planning in the recovered latent space.

This is the formal license behind the whole "JEPA = world model" claim. If the embedding is faithful up to a linear map, then planning in latents is planning in the world.

Why I am cautious for biology

Each assumption is exactly where a cell pushes back:

  • Independent Gaussian latents — gene programs are correlated, multimodal, and heavy-tailed.
  • Stationary additive-noise transitions — perturbation responses are state-dependent, saturating, and often multiplicative; dose and time bend the dynamics.
  • Single clean latent space — the interesting biology lives across modalities (DNA, RNA, protein, chemistry), which is a fusion problem the theorem does not address.

What I take from it

Use the theory as design guidance, not a guarantee. Concretely: prefer SIGReg-style isotropic-Gaussian regularization as a default anti-collapse; probe whether the learned latent dimensions actually align with known biological factors; and use real perturbation and temporal data to break the non-identifiability the theory warns about. The math tells you what a faithful latent looks like; the wet-lab data tells you whether you got one.