At a glance
ProblemTime-series JEPA embeddings often cluster by dynamical regime without labels; it was unclear why such structure emerges.
Key ideaAnalyze the JEPA predictor through Koopman operator theory: latent prediction discovers Koopman-invariant coordinates that separate dynamical regimes.
ModalityTime series (dynamical systems analysis)
Target / maskingTemporal windows; the predictor matches masked future/segment latents under an EMA target.
Builds onTime-series JEPA (je-temporal); Koopman operator theory.
Used forExplaining and diagnosing emergent clustering in time-series JEPA embeddings.

Motivation

When JEPAs are trained on time series, their embeddings often cluster by the underlying dynamical regime even without any labels — series governed by the same dynamics land near each other. This is a striking and useful emergent property, but it had only been observed empirically. This analytical work asks why it happens, seeking a principled, dynamical-systems explanation rather than treating the clustering as a fortunate accident, so that one can predict when it should occur.

How it works

Time serieswindows · blockContext encoderf_θTarget encoderf̄_θ · EMAPredictorg_φlatent loss‖ẑ − sg(z̄)‖²z_ctxz̄ (sg)EMA copy
Canonical JEPA schematic for Time series. The input is split into a visible context and hidden targets (window-level, block). The context encoder $f_\theta$ embeds what is visible; the target encoder $\bar f_\theta$ (an EMA copy, gradient stopped) embeds the targets; the predictor $g_\phi$ maps context to the target embeddings; training minimises the latent distance.

The study takes a trained time-series JEPA — context encoder $f_\theta$, EMA target encoder $\bar f_\theta$, and predictor $g_\phi$ matching masked future/segment latents, with temporal windows as the masking unit — and analyzes its learned predictor through the lens of Koopman operator theory.

  • A Koopman operator linearizes nonlinear dynamics by acting on observables (functions of the state).
  • The authors show the JEPA predictor approximates such a linear evolution in latent space.
  • Its invariants — Koopman eigenfunctions and spectral quantities preserved under the dynamics — coincide with the directions along which embeddings separate.

Thus the analysis connects the geometry of the embedding to the spectral structure of the generating dynamical system.

The objective

The underlying model is trained with the standard temporal JEPA regression,

$$\mathcal{L} = \big\lVert\, g_\phi(z_{\text{ctx}}, m) - \operatorname{sg}\big[\bar f_\theta(x)_{\text{future}}\big]\,\big\rVert_2^2,$$

with $\operatorname{sg}$ the stop-gradient and EMA targets. The contribution is analytical, not a new loss: the authors argue that minimizing this latent next-step prediction objective implicitly forces $g_\phi$ to act as an approximate linear (Koopman) operator on the encoder's observables, so the encoder is driven to discover Koopman-invariant coordinates of the generating system.

Key results & what's novel

The work gives a dynamical-systems account of representation geometry in time-series JEPAs. Latent next-step prediction implicitly forces the encoder to discover Koopman-invariant coordinates of the generating dynamical system; series sharing a dynamical regime collapse onto the same invariants, producing the emergent clustering. By connecting self-supervised latent prediction to Koopman theory, it explains why and when such clustering should be expected and offers a principled diagnostic for what these models learn — turning an empirical observation into a theoretical prediction.

Strengths & limitations

  • + Principled, dynamical-systems explanation for emergent regime clustering.
  • + Connects JEPA latent prediction to the well-developed Koopman framework.
  • + Offers a diagnostic for when clustering should appear.
  • Analytical: explains rather than improves a model.
  • The linear-Koopman approximation holds best for systems well-described by a finite set of observables; strongly chaotic or non-autonomous dynamics may break it.
  • Conclusions are tied to the specific predictor and masking regime analyzed.

Connections & references