Polymer-JEPA — World Modeling

At a glance

ProblemPolymer property prediction has exceptionally scarce, expensive labels, and molecular graphs carry repeating-unit and connectivity structure that generic chemistry models handle poorly.

Key ideaPretrain a JEPA on polymer molecular graphs by predicting the latent embeddings of masked connected subgraphs/motifs, so a representation transfers when downstream labels are minimal.

ModalityMolecular graph

Target / maskingMask connected subgraphs / motifs (monomer units, functional groups); a target encoder supplies latent targets.

Builds onGraph-JEPA's masked-subgraph latent prediction.

Used forPolymer property prediction, with the largest gains under very scarce labels.

Motivation

Polymer property prediction lives under a hard constraint: experimental labels are exceptionally scarce and expensive. At the same time polymer molecular graphs carry repeating-unit and connectivity structure that generic chemistry models handle poorly. Polymer-JEPA (Piccoli et al., 2025) targets exactly this regime — it pretrains a representation self-supervised on polymer graphs so that it transfers when the downstream labelled dataset is minimal, the defining condition of early-stage discovery.

How it works

Canonical JEPA schematic for Molecular graph. The input is split into a visible context and hidden targets (subgraph/motif-level, masked motifs). The context encoder $f_\theta$ embeds what is visible; the target encoder $\bar f_\theta$ (an EMA copy, gradient stopped) embeds the targets; the predictor $g_\phi$ maps context to the target embeddings; training minimises the latent distance.

Polymer-JEPA applies the graph latent-prediction recipe to polymer molecular graphs.

A context encoder embeds a visible portion of the graph.
A predictor predicts the latent embedding of masked regions.
A target encoder embeds the masked region to supply stop-gradient latent targets.

The masking unit is connected subgraphs / motifs — chemically meaningful fragments such as monomer units or functional groups — so the model learns to infer the representation of a missing structural motif from its surrounding context, rather than copying local atom features. Masking connected fragments (instead of scattered nodes) forces the model to reason about coherent chemical substructures.

The objective

The loss is the latent distance between the predicted and target embeddings of the masked motifs:

$$\mathcal{L} = \sum_{k\in\text{mask}} \big\lVert\, g_\phi(z_{\text{ctx}}, m_k) - \operatorname{sg}[\bar f_\theta(x)_k]\,\big\rVert_2^2,$$

with predictor $g_\phi$, stop-gradient $\operatorname{sg}$, and target encoder $\bar f_\theta$. As in Graph-JEPA, no contrastive negatives and no graph augmentations are needed; the motif-level masking supplies the structural inductive bias.

Key results & what's novel

The key result is regime-specific: JEPA pretraining yields the largest downstream gains when labelled data is very scarce — precisely the early-discovery regime where only a handful of measured examples exist. As the number of labels grows, the advantage over training from scratch narrows. The transferable lesson is therefore about where the value lies: motif-masked latent pretraining on molecular graphs delivers its payoff when experimental data is the binding constraint, offering a concrete recipe for extracting signal from tiny labelled chemistry datasets.

Strengths & limitations

+ Largest gains exactly in the scarce-label regime where inductive bias matters most.
+ Chemically meaningful motif masking; no augmentations or negatives.
+ Specialised to repeating-unit polymer graph structure.
− The advantage narrows as labelled data grows, so the win is regime-dependent.
− Depends on how motifs/connected subgraphs are defined and sampled.
− Learns a static representation; it does not model reactions or dynamics.

Connections & references

Builds onGraph-JEPA

RelatedJEPA-DNA ProteinJEPA Cell-JEPA

Paper ↗Code ↗