Motivation
Many tasks need a representation of an entire graph — a molecule, for instance — rather than of individual nodes. The two standard self-supervised routes both have weaknesses on graphs. Contrastive graph methods depend on hand-crafted augmentations (edge dropping, node masking) whose semantics are unclear for structured data: it is not obvious which perturbations should leave a graph's meaning invariant. Generative reconstruction of graph topology is brittle. Graph-JEPA (Skenderi et al., 2023) brings the augmentation-free, latent-prediction recipe of I-JEPA to graph-level learning.
How it works
A portion of the graph is masked, and the model predicts the embedding of the masked subgraph from the representation of the visible context.
- A context encoder embeds the visible part of the graph.
- A predictor maps the context representation to the predicted latent embedding of the masked subgraph.
- A target encoder embeds the masked subgraph to supply stop-gradient latent targets.
The masking unit is subgraphs, so the model must infer the representation of missing structural neighbourhoods rather than copy local node features. This adapts I-JEPA-style block prediction from a regular pixel grid to non-Euclidean graph topology — no negatives and no augmentations required.
The objective
The loss is the latent distance between predicted and target subgraph embeddings:
$$\mathcal{L} = \big\lVert\, g_\phi(z_{\text{ctx}}) - \operatorname{sg}[\bar f_\theta(\text{subgraph})]\,\big\rVert_2^2,$$
where $g_\phi$ is the predictor, $\operatorname{sg}$ is stop-gradient, and $\bar f_\theta$ is the target encoder. As in the image case, the asymmetric target encoder together with the predictor and subgraph masking provides the learning signal without contrastive negatives or augmentations.
Key results & what's novel
The key contribution is to show that graph-level joint-embedding prediction is a viable SSL objective: predicting masked-subgraph embeddings produces transferable whole-graph representations without negatives or augmentations. This matters because it removes the awkward dependence on graph augmentations whose invariances are ill-defined. As foundational graph-SSL methodology, Graph-JEPA establishes that latent subgraph prediction yields useful structure, and it serves as the basis the molecular and polymer JEPAs build on.
Strengths & limitations
- + No contrastive negatives and no hand-crafted graph augmentations.
- + Whole-graph representations that transfer to downstream tasks.
- + Cleanly adapts the I-JEPA block-prediction idea to non-Euclidean topology.
- − Performance depends on how subgraphs are sampled and masked.
- − Predicting an expected target embedding can wash out fine structural detail; it is not generative.
- − Learns a static graph representation, with no notion of dynamics.