JEPA Survey

Joint-Embedding Predictive Architectures

A comprehensive survey of every JEPA variant — from Yann LeCun's foundational paper to the latest world models and robotic applications. Each variant is explained at two levels: accessible (step-by-step with analogies) and PhD-level (with full math, derivations, and analysis). Every article includes detailed training and inference diagrams.

0/18 articles generated · auto-updated · About · History

Writing Progress

5/18
Written
0
Accepted
13
Pending
6.83
Avg Score
Overall 28% written · 0% accepted
Cat
Category
Progress
Done
Score
FOUNDATION
Foundation
2/2
7.1
VISION
Vision
1/2
6.7
VIDEO
Video
2/2
6.6
AUDIO
Audio
0/1
--
3D
3D / Spatial
0/2
--
ROBOTICS
Robotics
0/1
--
PHYSICS
Physics / World Models
0/4
--
SCALING
Scaling / Theory
0/2
--
SKELETON
Skeleton / Action
0/1
--
TRAJECTORY
Trajectory / Spatial
0/1
--

All JEPA Variants at a Glance

Click any variant to read the full article. Status badge shows current writing progress.

JEPAIn Review
Joint-Embedding Predictive Architecture
Predicts in latent space instead of pixel space — the foundational architecture that launched the JEPA family
H-JEPAIn Review
Hierarchical JEPA
Adds hierarchy — short-term details at lower levels, long-term planning at higher levels
From: JEPA
I-JEPAIn Review
Image JEPA
Masks image patches and predicts their semantics — bypasses heavy compute of autoencoders
From: JEPA
MC-JEPAIn Review
Motion-Content JEPA
Separates content (what) from motion (how) in video with dual encoder streams
From: I-JEPA
Audio-JEPAIn Review
Audio JEPA
Applies JEPA masking to spectrograms — learns semantic audio features without labels
From: I-JEPA
V-JEPAIn Review
Video JEPA
Masks video features with no text labels — learns pure visual dynamics at scale
From: I-JEPA
Point-JEPAIn Review
Point Cloud JEPA
Self-supervised learning on LiDAR point clouds for autonomous vehicles
From: I-JEPA
3D-JEPAIn Review
3D JEPA
Multi-block sampling on 3D point clouds for richer context-aware representations
From: I-JEPA
ACT-JEPAIn Review
Action-Conditioned JEPA
Action-conditioned latent prediction for robotic manipulation, robust to sensor noise
From: V-JEPA
V-JEPA 2In Review
Video JEPA 2
Predicts future physical states caused by actions — a world model that plans before acting
From: V-JEPA
LeJEPAIn Review
Legendre JEPA
Replaces all heuristic collapse-prevention tricks with a single provable regularizer (SIGReg)
From: JEPA
Causal-JEPAIn Review
Causal JEPA
Learns true cause-and-effect physics by applying object-level masking
From: LeJEPA, V-JEPA
V-JEPA 2.1In Review
Video JEPA 2.1
Dense predictive loss across image and video for spatial grounding
From: V-JEPA-2
LeWorldModelIn Review
Legendre World Model
LeJEPA's energy-based framework compressed into a tiny 15M-parameter world model
From: LeJEPA
ThinkJEPAIn Review
Think JEPA
Dense physical prediction + VLM reasoning for long-term strategic planning
From: V-JEPA-2, LeJEPA
S-JEPAIn Review
Skeletal JEPA
Predicts latent representations of masked skeleton joints using motion-aware spatial masking for action recognition
From: I-JEPA
T-JEPAIn Review
Trajectory JEPA
Self-supervised trajectory similarity via JEPA — predicts missing trajectory segments in latent space without manual augmentation
From: I-JEPA
DSeq-JEPAIn Review
Discriminative Sequential JEPA
Enhances I-JEPA with attention-driven saliency to predict image regions sequentially from most to least discriminative
From: I-JEPA