ENMA: Tokenwise Autoregression for Generative Neural PDE Operators.

1 Sorbonne Université, CNRS, ISIR, F-75005 Paris, France
2 Criteo AI Lab, Paris, France
* Equal contribution.

SPOTLIGHT poster at NeurIPS 2025.

Abstract

Solving time-dependent parametric partial differential equations (PDEs) remains a fundamental challenge for neural solvers, particularly when generalizing across a wide range of physical parameters and dynamics. When data is uncertain or incomplete—as is often the case—a natural approach is to turn to generative models.We introduce ENMA, a generative neural operator designed to model spatio-temporal dynamics arising from physical phenomena. ENMA predicts future dynamics in a compressed latent space using a generative masked autoregressive transformer trained with flow matching loss, enabling tokenwise generation. Irregularly sampled spatial observations are encoded into uniform latent representations via attention mechanisms and further compressed through a spatio-temporal convolutional encoder. This allows ENMA to perform in-context learning at inference time by conditioning on either past states of the target trajectory or auxiliary context trajectories with similar dynamics. The result is a robust and adaptable framework that generalizes to new PDE regimes and supports one-shot surrogate modeling of time-dependent parametric PDEs.

The ENMA framework

We introduce ENMA (presented in the Figure below), a continuous autoregressive neural operator for modeling time-dependent parametric PDEs, where parameters such as initial conditions, coefficients, and forcing terms may vary across instances. ENMA operates entirely in a continuous latent space and advances both the encoder-decoder pipeline and the generative modeling component crucial to neural PDE solvers. The encoder employs attention mechanisms to process irregular spatio-temporal inputs (i.e., unordered point sets) and maps them onto a structured, grid-aligned latent space. A causal spatio-temporal convolutional encoder then compresses these observations into compact latent tokens spanning multiple states. Generation proceeds in two stages. A causal transformer first predicts future latent states autoregressively. Then, a masked spatial transformer decodes each state at the token level using Flow Matching to model per-token conditional distributions in continuous space—providing a more efficient alternative to full-frame diffusion. Finally, the decoder reconstructs the full physical trajectory from the generated latents.

diagram of the neural solver

Our contributions are as follows:

  • We introduce ENMA, the first neural operator to perform autoregressive generation over continuous latent tokens for physical systems, enabling accurate and scalable modeling of parametric PDEs while avoiding the limitations of discrete quantization.
  • Using a masked spatial transformer trained with a Flow Matching objective to model per-token conditional distributions, offers a principled and efficient alternative to full-frame diffusion models for generation.
  • ENMA supports probabilistic forecasting via tokenwise sampling and adapts to novel PDE regimes at inference time through temporal or trajectory-based conditioning—without retraining.
  • To handle irregularly sampled inputs and support multi-state tokenization, ENMA leverages attention-based encoding combined with causal temporal convolutions.

Comparison with baselines

We evaluate the proposed method on several baselines and datasets to illustrate the benefits of our method. First, we show the results of the tokenwize generation process, and then, we show a comparison of our auto-encoding strategy with baselines.

Quantitative comparison of neural solver with baselines Quantitative comparison of neural solver with baselines

Generative Capabilities

ENMA functions as a generative neural operator capable of producing stochastic and physically consistent trajectories. We highlight two core experiments that demonstrate its generative ability: an uncertainty quantification s ENMA performs uncertainty estimation by sampling multiple trajectories from its continuous latent space through stochastic flow matching.

Quantitative comparison of neural solver with baselines

We further assess ENMA’s ability to generate full trajectories without conditioning on the initial state or PDE parameters. Given only a context trajectory, ENMA infers the latent physics and synthesizes coherent spatio-temporal fields

Quantitative comparison of neural solver with baselines

Visualizations

BibTeX


        @misc{koupaï2025enmatokenwiseautoregressiongenerative,
          title={ENMA: Tokenwise Autoregression for Generative Neural PDE Operators}, 
          author={Armand Kassaï Koupaï and Lise Le Boudec and Louis Serrano and Patrick Gallinari},
          year={2025},
          eprint={2506.06158},
          archivePrefix={arXiv},
          primaryClass={cs.LG},
          url={https://arxiv.org/abs/2506.06158}, 
        }