Diffusion-based Large Language Models Survey

Abstract

Diffusion-based large language models (DLLMs) have emerged as a promising alternative to traditional autoregressive architectures, notably enhancing parallel generation, controllability, and robustness across multiple modalities. Originally developed from continuous diffusion methods in computer vision, recent adaptations of DLLMs have tailored discrete diffusion processes through absorbing-state kernels, latent projections, and hybrid architectures.

This survey reviews recent developments in DLLMs, beginning with their foundational concepts, including DDPM, DDIM, and their early discrete adaptations, such as mask-based, continuous-embedding, and hybrid models. We organize current methods by sampling strategy, guidance type, noise schedule, and temporal conditioning, and analyzes their efficiency, output quality, and fine-tuning.

The paper also highlights key advancements: autoregressive-diffusion unification through hyperschedules, adaptive correction sampling, and efficient caching mechanisms to enhance computational performance. Besides, it explores emerging applications, such as natural language tasks, multimodal generation, and reasoning-intensive domains. These demonstrate the versatility of DLLMs.

Furthermore, the paper identifies critical challenges, including adaptive sampling, scalable alignment strategies, deeper integration with pretrained language models, graph-based diffusion frameworks, and robust evaluation protocols. Finally, the paper proposes directions that could define future research in diffusion-based sequence generation.

Key Contributions

Comprehensive Taxonomy: We provide a systematic categorization of diffusion language models based on their architectural choices, training objectives, and sampling strategies.
Evolution Analysis: We trace the development from continuous diffusion models to discrete variants specifically designed for text generation.
Performance Evaluation: We analyze the trade-offs between different approaches in terms of generation quality, computational efficiency, and controllability.
Future Directions: We identify promising research directions including adaptive sampling, scalable alignment, and integration with existing LLMs.
Extensive Bibliography: We compile 53 key papers with verified links to help researchers navigate this rapidly evolving field.

Survey Structure

Evolution & Foundations

Historical Development
Core Challenges
Categorization Methods

Technical Advances

Interoperability with AR Models
Knowledge Transfer
Inference Speed Optimization

Applications & Future

Multimodality & Reasoning
Evaluation Metrics
Future Research Directions

Key Figures

Bibliography

This survey covers 53 key papers in the field of diffusion-based language models. Click on any paper title to access it directly.

Core Foundation Papers (References [1-3])

Denoising Diffusion Probabilistic Models, Ho et al.

Denoising Diffusion Implicit Models, Song et al.

Structured Denoising Diffusion Models in Discrete State-Spaces, Hoogeboom et al.

Early Text Adaptations (References [4-11])

A Survey of Diffusion Models in Natural Language Processing, Zou et al.

Diffusion-LM Improves Controllable Text Generation, Li et al.

Continuous Diffusion for Categorical Data, Dieleman et al.

Self-Conditioned Embedding Diffusion for Text Generation, Strudel et al.

Difformer: ODE-Based Diffusion within Transformer Blocks, Gong et al.

Composable Text Controls via Latent ODE Diffusion, Liu et al.

SSD-LM: Semi-Autoregressive Simplex Diffusion for Machine Translation, Han et al.

SeqDiffuSeq: Sequence-to-Sequence with Masked Diffusion, Yuan et al.

Hybrid and Advanced Models (References [12-28])

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models, Arriola et al.

Latent Diffusion for Language Generation, Lovelace et al.

Energy-Based Diffusion Language Models for Text Generation, Xu et al.

Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion, Christopher et al.

David helps Goliath: Inference-Time Collaboration Between Small Specialized and Large General Diffusion LMs, Han et al.

HybridVLA: Vision-Language Action Model for Robotics, Liu et al.

Large Language Diffusion Models, Nie et al.

A Reparameterized Discrete Diffusion Model for Text Generation, Zheng et al.

Generalized Interpolating Discrete Diffusion, Rütte et al.

Unifying Autoregressive and Diffusion-Based Sequence Generation, Fathi et al.

DiffPO: Diffusion-styled Preference Optimization, Chen et al.

TEncDM: Understanding the Properties of the Diffusion Model in the Space of Language Model Encodings, Shabalin et al.

SSD-2: Scaling and Inference-time Fusion of Diffusion Language Models, Han et al.

Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data, Ou et al.

Deep Unsupervised Learning using Nonequilibrium Thermodynamics, Sohl-Dickstein et al.

Improved Denoising Diffusion Probabilistic Models, Nichol et al.

Classifier-Free Diffusion Guidance, Ho et al.

Evaluation (References [29-41])

BERTScore: Evaluating Text Generation with BERT, Zhang et al.

BLEU: a Method for Automatic Evaluation of Machine Translation, Papineni et al.

ROUGE: A Package for Automatic Evaluation of Summaries, Lin

COMET: A Neural Framework for MT Evaluation, Rei et al.

A Diversity-Promoting Objective Function for Neural Conversation Models, Li et al.

MAUVE: Measuring the Gap Between Neural Text and Human Text, Pillutla et al.

Beyond the Imitation Game: Assessing Multitask Language Understanding, Srivastava et al.

SPICE: Semantic Propositional Image Caption Evaluation, Anderson et al.

Adversarial Examples in NLP: A Survey of Methods and Benchmarks, Eger et al.

On Calibration of Modern Neural Networks, Guo et al.

Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution, Lou et al.

Simple and Effective Masked Diffusion Language Models, Sahoo et al.

Discrete Diffusion Language Model for Efficient Text Summarization, Dat et al.

Applications (References [42-53])

Scaling Diffusion Language Models via Adaptation from Autoregressive Models, Gong et al.

DDPT: Diffusion-Driven Prompt Tuning, Li et al.

Mercury: Ultra-Fast Language Models Based on Diffusion, Labs et al.

Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning, Ye et al.

d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning, Zhao et al.

The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation, Yin et al.

DiffSDS: A Language Diffusion Model for Protein Backbone Inpainting, Gao et al.

DPLM-2: A Multimodal Diffusion Protein Language Model, Wang et al.

Constrained Language Generation with Discrete Diffusion Models, Cardei et al.

P³SUM: Preserving Author's Perspective in News Summarization, Liu et al.

DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models, Gong et al.

Text-Driven Diffusion Model for Sign Language Production, He et al.

BibTeX

@article{tseng2025diffusion,
  title={Diffusion-based Large Language Models Survey},
  author={Tseng, Chiung-Yi and Zhang, Danyang and Bi, Ziqian and Song, Junhao},
  journal={TechRxiv},
  year={2025},
  url={https://www.techrxiv.org/users/952417/articles/1321784-diffusion-based-large-language-models-survey}
}