Motivation
Foundation-style forecasting aims for zero- and few-shot, in-context behavior across diverse series, but two obstacles stand in the way. Forecasting over raw values is noise-prone, and the model wastes effort tracking high-frequency fluctuation rather than structure. Separately, prior-fitted networks (PFNs) — transformers trained on synthetic data to perform amortized Bayesian inference — typically operate without a structured latent space, predicting directly over observations. LaT-PFN unifies the two: a JEPA supplies the structured latent space, and a PFN supplies amortized in-context inference within it.
How it works
LaT-PFN has two coupled components.
- A JEPA-style stage learns a joint embedding for time series: a context encoder $f_\theta$ and EMA target encoder $\bar f_\theta$, with a predictor $g_\phi$ matching future/masked-segment latents under a representation-space loss. The masking unit is a temporal window.
- On top of this latent space sits a PFN: a transformer trained on synthetic series drawn from a prior to perform Bayesian-style in-context prediction — now forecasting in the learned latent domain rather than over raw values.
A summary or context set of related series conditions the in-context prediction, so the PFN treats those neighbors as an implicit prior and amortizes inference at forecast time.
The objective
The JEPA stage minimizes a latent regression over masked/future windows,
$$\mathcal{L}_{\text{JEPA}} = \big\lVert\, g_\phi(z_{\text{ctx}}, m) - \operatorname{sg}\big[\bar f_\theta(x)_{\text{future}}\big]\,\big\rVert_2^2,$$
with $\operatorname{sg}$ the stop-gradient and EMA target updates. The PFN is trained by the standard prior-fitted objective — predictive likelihood over synthetic series sampled from a prior — but applied in the learned latent space, so it performs amortized Bayesian in-context forecasting on JEPA embeddings rather than raw observations.
Key results & what's novel
LaT-PFN shows that JEPA representations and prior-fitted in-context learning are complementary. Forecasting in a JEPA-learned latent space denoises the target — the embedding captures abstract temporal structure instead of raw fluctuation — while the PFN supplies amortized in-context inference that enables zero-shot adaptation, turning related series into an implicit prior. The result is an efficient latent foundation model for time-series forecasting, and a template for combining self-supervised embeddings with amortized Bayesian inference that generalizes beyond the specific architecture.
Strengths & limitations
- + Denoised forecasting by predicting in a structured latent space.
- + Zero/few-shot in-context adaptation via the PFN's amortized Bayesian inference.
- + A reusable template uniting self-supervised embeddings with prior-fitted networks.
- − PFN quality depends on the synthetic prior matching real series; prior misspecification hurts.
- − Two-stage design (JEPA + PFN) is more complex than end-to-end forecasters.
- − Latent forecasting can obscure interpretability of the value-space prediction.