Diffusion LMs Can Approximate Optimal Infilling Lengths Implicitly

This repository is the Hugging Face project page for the paper:

Diffusion LMs Can Approximate Optimal Infilling Lengths Implicitly

We propose CAL (Calibrated Adaptive Length), a training-free framework that enables Diffusion Language Models to approximate optimal infilling lengths without additional training.

🔗 Code Repository

The official implementation is hosted on GitHub:

👉 https://github.com/NiuHechang/Calibrated_Adaptive_Length

This Hugging Face repository only serves as a landing page linking to the official codebase and paper.

📄 Paper

Diffusion LMs Can Approximate Optimal Infilling Lengths Implicitly

Hengchang Liu, Zhao Yang, Bing Su

https://arxiv.org/abs/2602.00476

📝 Abstract (from the paper)

Diffusion language models (DLMs) provide a bidirectional generation framework naturally suited for infilling, yet their performance is constrained by the pre-specified infilling length. In this paper, we reveal that DLMs possess an inherent ability to discover the correct infilling length. We identify two key statistical phenomena in the first-step denoising confidence: a local Oracle Peak that emerges near the ground-truth length and a systematic Length Bias that often obscures this signal. By leveraging this signal and calibrating the bias, our training-free method CAL (Calibrated Adaptive Length) enables DLMs to approximate the optimal length through an efficient search before formal decoding. Empirical evaluations demonstrate that CAL improves Pass@1 by up to 47.7% over fixed-length baselines and 40.5% over chat-based adaptive methods in code infilling, while boosting BLEU-2 and ROUGE-L by up to 8.5% and 9.9% in text infilling. These results demonstrate that CAL paves the way for robust DLM infilling without requiring any specialized training.

🧠 Evaluation

The method is evaluated on multiple infilling benchmarks including:

HumanEval-Infilling (Code)
ROCStories (Text)
CSAbstracts (Text)
Yelp Reviews (Text)

For full implementation details and experiment setup, please refer to the GitHub repository.

📖 Citation

If you find this work useful, please cite:

@misc{liu2026diffusionlmsapproximateoptimal,
      title={Diffusion LMs Can Approximate Optimal Infilling Lengths Implicitly}, 
      author={Hengchang Liu and Zhao Yang and Bing Su},
      year={2026},
      eprint={2602.00476},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2602.00476}, 
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for Hengchang-Liu/CAL

Diffusion LMs Can Approximate Optimal Infilling Lengths Implicitly

Paper • 2602.00476 • Published Jan 31