ReadPaper Blog
Latent Reasoning with Normalizing Flows
The paper proposes NF-CoT, a framework for large language model reasoning that replaces verbose textual chain-of-thought with compact continuous latent thoughts while preserving the autoregressive interface that makes chain-of-thought useful. It uses normalizing flows inside the LLM causal stream so latent thoughts can be sampled left-to-right, assigned exact likelihoods, reused with KV-cache decoding, and optimized with supervised and policy-gradient objectives. The reported result is improved pass rates on code-generation benchmarks such as MBPP, MBPP+, HumanEval, HumanEval+, and LiveCodeBench v6 while reducing intermediate reasoning cost.
Source: Latent Reasoning with Normalizing Flows

Why Latent Reasoning?
The paper begins from the observation that chain-of-thought prompting improves large language model reasoning because it inserts intermediate variables between a prompt and a final answer. Its central problem is that explicit CoT represents those variables as discrete natural-language tokens, forcing every intermediate computation to be serialized, verbalized, and scored as text. The authors argue that this token-level representation is inefficient when the useful reasoning state may be semantic, uncertain, or only partially formed. Latent reasoning is presented as a higher-bandwidth alternative in which intermediate computation happens in compact continuous states before the model commits to answer tokens. The motivation is therefore not to discard chain-of-thought, but to retain its role as a sampled reasoning path while avoiding the verbosity and surface-form dependence of textual rationales.

The Missing Superpower
The paper identifies a key gap in prior latent-reasoning methods: many gain efficiency by moving into continuous space but lose properties that make explicit CoT compatible with standard autoregressive LLMs. Explicit CoT is naturally left-to-right, probabilistic, likelihood-scored, and compatible with ordinary decoding infrastructure such as the KV cache. Deterministic hidden-state methods such as Coconut keep reasoning close to the LLM but do not define a distribution over reasoning paths, while diffusion-style latent methods such as LaDiR introduce stochastic continuous latents but require iterative denoising and lack the same direct likelihood interface. NF-CoT is designed to make continuous thoughts behave more like language tokens from the model’s perspective. The paper frames the challenge as learning latent CoT without giving up native sampling, scoring, and decoding in the causal language-modeling stream.

NF-CoT’s Core Trick
The core method, NF-CoT, models continuous thoughts with normalizing flows so they can be sampled and scored autoregressively. The paper introduces an LLM-facing thought space, denoted as continuous thought tokens u1:K, whose prompt-conditioned distribution is modeled with a causal Gaussian density parameterized by functions of the prompt and previous thoughts. A shallow invertible flow maps encoder-derived continuous CoT targets e1:K into this u-space, preserving information equivalence while making the representation easier for autoregressive generation. Because the transformation is invertible and has a tractable Jacobian, the model can compute exact likelihoods for latent thoughts using the normalizing-flow change-of-variables formula. Architecturally, continuous-thought positions use an NF head to predict Gaussian parameters, while answer positions use the standard LM head, with both heads sharing the same LLM backbone and causal stream.

How Training Works
The training procedure distills explicit chain-of-thought supervision into continuous reasoning targets and then learns latent reasoning and answer generation jointly. The paper first uses a pretrained CoT encoder, described through a VAE-style continuous CoT setup, and freezes it so that explicit rationales can be converted into continuous targets e1:K. Shallow autoregressive flow blocks then reparameterize these targets into u1:K, and the LLM is trained with a unified supervised objective that combines flow negative log-likelihood for the latent thoughts with standard cross-entropy for the answer tokens. The authors describe a two-stage curriculum in which the LLM backbone is initially frozen while the shallow flow blocks and projection layers align the latent space, followed by end-to-end training of all parameters. The method also adds small Gaussian noise to deterministic target codes during flow training, following practices from TarFlow-style scalable normalizing flows to improve robustness.

What It Buys You
At inference time, NF-CoT samples continuous thoughts directly in u-space from the learned left-to-right autoregressive density, then switches to the LM head to generate the final answer in the same causal pass. The paper emphasizes that this design avoids running the training-time VAE branch and shallow flow blocks during decoding, while preserving compatibility with the original KV cache. In experiments on code-generation benchmarks including MBPP, MBPP+, HumanEval, HumanEval+, and LiveCodeBench v6, the authors report that NF-CoT improves pass rates over explicit-CoT and prior latent-reasoning baselines. They also report that the method substantially reduces intermediate-reasoning cost because the model no longer needs to emit long textual reasoning traces. A further implication is that exact latent likelihoods enable likelihood-based sampling and direct policy-gradient optimization in the continuous reasoning space, extending reinforcement-style refinement beyond discrete token traces.
