At a glance
ProblemA static embedding of cell state cannot answer the question drug discovery actually asks: what happens if we intervene?
Key ideaMake the JEPA predictor action-conditioned, where the action is a biological intervention — predict the post-perturbation cell latent from the pre-state plus the intervention.
ModalitySingle-cell state + interventions (genetic / chemical)
Target / maskingPredict the latent of the true post-perturbation state (EMA target); no masking of the input required.
Builds onV-JEPA 2-AC (action-conditioned world model) and Cell-JEPA (cell state encoder).
Used forCounterfactual response prediction; intervention search / planning — a digital cell simulator.

Motivation

Representation learning tells you where a cell is in latent space; it does not tell you where it will move under a knockout or a compound. Drug-discovery decisions are counterfactual — target validation, hit and lead selection, combination and dose choices all ask how state changes after an intervention. BioJEPA-AC takes the action-conditioned world-model idea proven on video and robotics (V-JEPA 2-AC) and points it at the cell, treating an intervention as the action.

How it works

cell x_preencoderf_θintervention aKO / drug · dosepredictorg_φz_preaẑ_postx_post → f̄_θz_post (sg)‖ẑ_post − z_post‖²healthydisease
BioJEPA-AC treats an intervention (knockout, compound + dose) as the action: $\hat z_{post}=g_\phi(z_{pre},a)$. Trained against the EMA embedding of the true post-perturbation state, it becomes a counterfactual simulator — predict where a perturbation moves the cell, then search for the intervention that drives it toward a target state.

The model couples a state encoder with an action-conditioned predictor:

  • A state encoder $f_\theta$ maps the pre-perturbation profile $x_{\text{pre}}$ to a latent $z_{\text{pre}}$ (the Cell-JEPA role).
  • An action encoder represents the intervention $a$ — a gene knockout/knockdown, or a compound with dose and time.
  • A predictor $g_\phi$ maps $(z_{\text{pre}}, a)$ to the predicted post-state latent $\hat z_{\text{post}}$.
  • An EMA target encoder embeds the actual measured post-state $x_{\text{post}}$ to supervise the prediction.

Once trained, it is used as a simulator: encode a starting state, apply candidate interventions, and search for the one whose predicted latent lands in a desired region (e.g. toward a healthy and away from a disease manifold).

The objective

The action-conditioned latent loss mirrors V-JEPA 2-AC, with interventions in place of robot actions:

$$\mathcal{L} = \big\lVert\, g_\phi(z_{\text{pre}}, a) - \operatorname{sg}\big[\bar f_\theta(x_{\text{post}})\big] \,\big\rVert_2^2 \;+\; \lambda\,\mathcal{R}(z),$$

where $\mathcal{R}$ is an anti-collapse regulariser. Planning then solves $\min_{a}\,\lVert \hat z_{\text{post}}(a) - z^* \rVert$ over candidate interventions toward a target state $z^*$.

Key results & what's novel

BioJEPA-AC is an open, in-progress project (GPTomics) rather than a finished benchmark paper, so it is best read as a concrete architecture proposal: the first explicit attempt to build a V-JEPA-2-style, action-conditioned world model of the cell. Its significance is the framing — turning self-supervised cell representations into a controllable, counterfactual simulator of perturbation response, which is precisely the missing piece between a good cell encoder (Cell-JEPA) and decision-making in the drug-discovery pipeline.

Strengths & limitations

  • + Directly answers counterfactual, intervention-level questions; supports planning/search.
  • + Reuses unlabelled state pretraining; interventions slot in as actions.
  • Needs paired pre/post perturbation data; evaluation must hold out unseen interventions, targets and contexts, not just random cells.
  • Causal validity is not automatic — latent prediction can capture correlation; identifiability theory (see the LeJEPA world-model analysis) warns the required assumptions are strong for biology.
  • Early-stage; quantitative performance is not yet established.

Connections & references