S-JEPA (Signal-JEPA) — World Modeling

At a glance

ProblemEEG and BCI datasets differ in electrode montages, channel counts and recording setups, so models trained on one dataset rarely transfer, and labelled EEG is costly.

Key ideaA joint-embedding predictive recipe for multichannel EEG with a dynamic spatial attention that adapts to differing electrode layouts, enabling cross-dataset transfer.

ModalityEEG (multichannel)

Target / maskingMask spatiotemporal signal segments across channels; a target encoder supplies stop-gradient latent targets.

Builds onI-JEPA / V-JEPA latent-prediction recipe applied to biosignals.

Used forCross-dataset EEG representation learning for brain-computer interfaces.

Motivation

A persistent obstacle in EEG and brain-computer interface (BCI) modelling is that datasets differ in electrode montages, channel counts, and recording setups. A model trained on one fixed channel topology rarely transfers to another, and labelled EEG is costly to collect. The goal of S-JEPA (Signal-JEPA, Guetschel et al., 2024) is a self-supervised representation that generalises across heterogeneous EEG sources rather than being locked to a single montage.

How it works

Canonical JEPA schematic for EEG. The input is split into a visible context and hidden targets (channel-time patch-level, spatiotemporal segments). The context encoder $f_\theta$ embeds what is visible; the target encoder $\bar f_\theta$ (an EMA copy, gradient stopped) embeds the targets; the predictor $g_\phi$ maps context to the target embeddings; training minimises the latent distance.

S-JEPA brings the joint-embedding predictive recipe to multichannel EEG.

A context encoder embeds visible signal segments.
A predictor predicts the latent representations of masked segments.
A target encoder supplies stop-gradient targets via a latent prediction loss.

The masking unit is spatiotemporal signal segments across channels (channel-time patches). The central architectural contribution is a dynamic spatial attention mechanism that adapts to differing electrode layouts: instead of assuming a fixed channel topology, attention is computed in a montage-aware way, enabling seamless cross-dataset transfer between recordings with different electrode configurations.

The objective

The training loss is the latent distance over masked spatiotemporal EEG segments:

$$\mathcal{L} = \sum_{k\in\text{mask}} \big\lVert\, g_\phi(z_{\text{ctx}}, m_k) - \operatorname{sg}[\bar f_\theta(x)_k]\,\big\rVert_2^2,$$

with predictor $g_\phi$, stop-gradient $\operatorname{sg}$, and target encoder $\bar f_\theta$. Predicting in latent space sidesteps the irreducible high-frequency noise of raw EEG, and the dynamic spatial attention lets the same encoder ingest recordings with different channel sets.

Key results & what's novel

The key idea is that latent prediction plus montage-agnostic spatial attention produces EEG representations portable across studies — a prerequisite for foundation-model-style reuse in neurotechnology. The novelty is squarely the spatial-attention design: most EEG models implicitly assume a fixed electrode layout, and S-JEPA removes that assumption, so a model can be pretrained on one corpus and applied to another with a different montage. Cross-dataset transfer is especially valuable because small cohorts can borrow strength from large public EEG corpora.

Strengths & limitations

+ Montage-agnostic: dynamic spatial attention enables cross-dataset transfer across electrode layouts.
+ Augmentation-free latent prediction, well suited to noisy biosignals.
+ Lets small datasets benefit from large unlabelled EEG corpora.
− Tokenisation and masking of spatiotemporal segments require care for EEG.
− Learns a representation, not a dynamics/world model of brain activity.
− Quality depends on the diversity of the pretraining corpora.

Connections & references

Builds onI-JEPA

RelatedBrain-JEPA EchoJEPA US-JEPA

Paper ↗