At a glance
ProblemEvaluating joint-embedding SSL models usually needs a trained, label-dependent linear probe — slow and awkward for model selection.
Key ideaA label-free metric, LiDAR (Linear Discriminant Analysis Rank), that predicts linear-probe quality from the embedding's discriminative rank.
ModalityEvaluation metric (joint-embedding SSL)
Target / maskingOperates on learned embeddings; agnostic to the pretraining objective.
Builds onEffective-rank diagnostics and linear discriminant analysis.
Used forFast model selection, training monitoring and early detection of dimensional collapse.

Motivation

The standard way to judge a self-supervised joint-embedding model is to freeze it and train a supervised linear probe, then read off accuracy. This is slow, requires labels for the target task, and is impractical for sweeping many checkpoints or monitoring training in real time. LiDAR (Thilak et al., 2023) proposes a metric that predicts downstream linear-probing performance directly from the learned representations, without needing labels for the target task, so practitioners can rank models cheaply.

How it works

LiDAR — Linear Discriminant Analysis Rank — measures the effective rank and discriminative structure of the embedding distribution. The idea is that a good representation has variance spread across many informative, separable directions, while a collapsed one concentrates on few directions. Using a discriminant-analysis-style decomposition into between- and within-cluster scatter, LiDAR computes an effective rank of the resulting matrix: rich, well-spread, class-relevant structure scores highly, and dimensional collapse scores poorly. It is sensitive to exactly the failure mode that plagues non-contrastive and JEPA-style methods, and applies across architectures including BYOL/DINO-style self-distillation and JEPA.

The metric

LiDAR forms a discriminant matrix from between-class and within-class scatter, $\Sigma_b$ and $\Sigma_w$,

$$\Sigma_{\text{lda}} = \Sigma_w^{-1/2}\,\Sigma_b\,\Sigma_w^{-1/2},$$

and reports its effective rank — the entropy of the normalised eigenvalue spectrum, $\exp\big(-\sum_i p_i \log p_i\big)$ with $p_i = \lambda_i / \sum_j \lambda_j$. A high effective rank means variance is distributed across many discriminative directions; a low value signals collapse. The score correlates with downstream linear-probe accuracy, serving as a fast surrogate.

Why it matters

LiDAR gives a fast, low-overhead, label-light diagnostic for the health of a learned embedding space — useful for model selection and for watching training. For world modeling this is valuable because JEPAs are prone to subtle collapse that pixel-space metrics cannot detect; a signal that tracks representation rank and downstream usefulness helps tune anti-collapse mechanisms, compare world-model encoders and catch degenerate solutions early, before an expensive probe is run.

Strengths & limitations

  • + Cheap, fast surrogate for linear-probe accuracy.
  • + Directly sensitive to dimensional collapse.
  • + Architecture-agnostic across joint-embedding methods.
  • Needs some grouping/pseudo-label structure to define between- vs within-scatter.
  • A rank-based proxy that can diverge from true accuracy when class structure is non-linear.

Connections & references