Encyclopedia
Every method as a short, self-contained card. Use the sidebar filter or the chips to narrow by modality. ★ marks entries central to a biological world model.
No methods match this filter and search.
Foundations 2
The position papers and energy-based ideas that define what a world model is and why prediction in latent space matters.
Core Architectures 4
The canonical JEPA models for images and video that established the recipe.
The first image JEPA: predict latent representations of target blocks from one context block, no augmentations.
Self-supervised video representations by predicting masked spatiotemporal features in latent space.
A shared encoder learning optical flow (motion) and content features jointly via multi-task joint-embedding.
V-JEPA 2 extended with dense and deep self-supervision plus multimodal tokenizers for dense features.
Theory & Analysis 12
Why JEPAs work, when they collapse, and what they provably learn.
A theory of JEPAs pinpointing the isotropic Gaussian as the optimal embedding, enforced by SIGReg.
A lightweight single-GPU library unifying energy-based JEPAs for images, video, and planning.
JEPAs preferentially encode slowly varying factors, linking latent prediction to slow feature analysis.
Deep linear self-distillation analysis explaining JEPA's implicit bias against noisy, unpredictable features.
Formally relates the non-contrastive JEPA objective to contrastive self-supervised learning.
Characterizes when and why auxiliary objectives improve JEPA pretraining and representations.
Image World Models generalize I-JEPA by conditioning the latent predictor on input transformations.
A label-free metric predicting linear-probe quality of joint-embedding SSL models from embedding rank.
Variance-Invariance-Covariance regularization: an explicit anti-collapse objective reused across JEPA variants.
DirectPred analysis explaining how non-contrastive SSL avoids collapse via predictor and EMA dynamics.
Recasts the JEPA objective in a variational, probabilistic framework over latent representations.
Studies Gaussian embedding distributions as the target for joint-embedding self-supervised learning.
World Models, Robotics & Planning 12
Action-conditioned latent models that predict consequences and plan — the heart of world modeling.
Internet-scale video world model whose action-conditioned variant enables zero-shot robot planning.
An empirical dissection of the design choices that make JEPA world models actually plan well.
A JEPA jointly predicting action and observation latents for sample-efficient policy learning.
Guides action search in a JEPA world model with learned values, sharpening long-horizon planning.
Couples a vision-language-action model with a JEPA latent world model for grounded, predictive control.
Object-centric JEPA with entity-level latent masking for counterfactual reasoning and efficient planning.
Studies which invariances in JEPA representations help versus hurt downstream planning.
Stable, teacher-free action-conditioned latent world model learned end-to-end from pixels via SIGReg.
Hierarchical, multi-timescale planning in latent world models for long-horizon, compute-cheap control.
A standardized, reproducible research stack for training, planning with, and evaluating world models.
Theory of when a JEPA provably recovers latent world variables up to rotation — read as design guidance.
Pairs dense JEPA latent dynamics with a vision-language 'thinker' for long-horizon semantic guidance.
Biology & Drug Discovery 4
Cells, genomes, proteins — latent world models for the drug-discovery pipeline.
Action-conditioned JEPA building a world model for cells: predict how cell states respond to perturbations.
Masked latent prediction for single-cell transcriptomics; a robust state encoder, not a perturbation model.
Model-agnostic continual training grounding genomic foundation models with a global-embedding JEPA loss.
Latent prediction complements protein LMs; JEPA-only collapses, but MLM + masked-position JEPA wins.
Graphs & Molecules 2
Latent prediction over graph structure, including molecular and polymer graphs.
Medical Imaging & Biosignals 8
Ultrasound, echo, X-ray, EEG, ECG and brain dynamics as latent predictive foundation models.
Signal-JEPA for EEG and brain-computer interfaces with seamless cross-dataset transfer via spatial attention.
A JEPA foundation model for brain dynamics with gradient positioning and spatiotemporal masking.
JEPA pretraining of latent ECG features boosts downstream ECG classification.
Transfers the JEPA video recipe to EEG, adapting joint-embedding prediction to brain signal analysis.
Multimodal JEPA jointly embedding imaging and clinical data for mechanism-to-endpoint fusion.
A JEPA encoder for chest radiographs via latent prediction over X-ray images.
Latent predictive foundation model for echo video; 18M echos, improved LVEF/RVSP, pediatric zero-shot.
JEPA for medical ultrasound; masked latent prediction beats pixel reconstruction on low-SNR speckle.
Audio & Speech 6
Spectrogram and waveform JEPAs for general audio, music and speech.
JEPA adapted to audio spectrograms via curriculum masked latent prediction.
Systematic study of masking, encoders, and targets for general-audio JEPA.
Predicts compatibility of musical stems in a shared embedding space.
General audio representation learning following the JEPA latent-prediction recipe.
JEPA on raw waveforms for robust, augmentation-free audio foundation models.
JEPA-style latent prediction for distilling audio knowledge into lip-reading models.
3D & Point Clouds 3
Self-supervised latent prediction over 3D shapes, scenes and point clouds.
Time Series & Tabular 7
Forecasting, anomaly detection and augmentation-free representation for sequences and tables.
Couples a JEPA latent space with prior-fitted networks for in-context forecasting.
Self-supervised trajectory embeddings for similarity search via latent prediction.
Augmentation-free JEPA for tabular data via latent prediction over feature subsets.
Extends joint-embedding self-supervision to temporal structure in time series.
Explains emergent time-series clustering in JEPA embeddings via Koopman invariants.
Multi-resolution JEPA for predicting anomalies in time series.
Multimodal JEPA producing semantic embeddings for sensor time series.
Earth Observation 4
Remote-sensing and satellite JEPAs spanning resolutions and modalities.
JEPA for SAR target recognition that predicts gradient-domain features rather than raw pixels.
One JEPA-style Earth-observation model spanning many resolutions, scales, and sensor modalities.
A JEPA tailored to efficient large-scale remote-sensing image retrieval via latent prediction.
Cross-modal predictive alignment extending JEPA to multi-sensor remote-sensing retrieval.
Language & Multimodal 3
JEPA objectives for text, recommendation and text-image systems.
An energy-based JEPA aligning text and image embeddings for multimodal systems.
Adds a JEPA embedding-prediction objective to LLM training alongside next-token prediction.
JEPA for sequential recommendation, predicting masked item representations in language-embedding space.
Generative Modeling 3
Using the JEPA objective for denoising and conditional generation.
Denoising with a JEPA: generative modeling cast as autoregressive denoising in embedding space.
Improves JEPA by injecting diffusion-style noise into the joint-embedding prediction objective.
JEPA-T: text-conditioned joint-embedding prediction for controllable image generation.