I-DLM-8B

Introspective Diffusion Language Model (8B) — a diffusion language model converted from Qwen3-8B that matches AR quality while enabling parallel token generation.

[Project Page] [Paper] [Code]

Highlights

  • First DLM to match same-scale AR quality across 15 benchmarks
  • Introspective Strided Decoding (ISD): single-pass generation + verification with p/q acceptance criterion
  • AR-compatible serving via SGLang (paged KV cache, continuous batching, CUDA graphs)
  • 2.9–4.1× higher throughput than prior DLMs at high concurrency

Results

Quality (I-DLM-8B vs baselines)

Benchmark I-DLM-8B Qwen3-8B (AR) LLaDA-2.1-mini (16B) SDAR (8B)
ARC-C 95.8 95.8 90.2 91.9
MMLU 82.4 83.5 74.5 78.6
MMLU-Pro 73.1 75.1 64.8 56.9
GPQA-D 55.6 58.9 46.0 40.2
GPQA 54.9 55.4 53.3 ---
GSM8K 95.0 96.0 89.0 91.7
MATH-500 96.8 95.8 85.0 78.6
MathBench 89.1 93.1 84.2 76.9
AIME-24 69.6 73.1 43.3 10.0
AIME-25 60.8 65.4 43.3 10.0
HumanEval 93.3 95.1 86.0 78.7
MBPP 92.2 93.4 82.1 72.0
LiveCodeBench-v6 45.7 50.3 30.4 16.6
IFEval 84.7 84.7 83.2 61.4

Usage

This model uses a custom architecture (SDARForCausalLM) and requires trust_remote_code=True.

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "yifanyu/I-DLM-8B",
    trust_remote_code=True,
    torch_dtype="auto",
)
tokenizer = AutoTokenizer.from_pretrained("yifanyu/I-DLM-8B")

For training code and ISD inference, see the GitHub repo.

Method

I-DLM recovers introspective consistency (AR models' inherent self-agreement) through:

  1. Strict causal masking across both masked and clean tokens
  2. Logit shift (Dream shift): hidden state at position i predicts token i+1
  3. All-masked training with auto-balanced loss: CE loss on both noisy and clean token positions, dynamically balanced

Related Models

Model HuggingFace Description
I-DLM-8B yifanyu/I-DLM-8B Converted from Qwen3-8B
I-DLM-32B yifanyu/I-DLM-32B Converted from Qwen3-32B
I-DLM-8B-LoRA yifanyu/I-DLM-8B-lora-r128 Gated LoRA adapter (rank=128) for lossless R-ISD

Citation

@article{yu2026introspective,
  title={Introspective Diffusion Language Models},
  author={Yu, Yifan and Jian, Yuqing and Wang, Junxiong and Zhou, Zhongzhu
          and Zhuang, Donglin and Fang, Xinyu and Yanamandra, Sri
          and Wu, Xiaoxia and Wu, Qingyang and Song, Shuaiwen Leon
          and Dao, Tri and Athiwaratkun, Ben and Zou, James
          and Lai, Fan and Xu, Chenfeng},
  journal={arXiv preprint arXiv:2604.11035},
  year={2026}
}
Downloads last month
409
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yifanyu/I-DLM-8B

Finetuned
Qwen/Qwen3-8B
Finetuned
(1425)
this model
Adapters
1 model

Collection including yifanyu/I-DLM-8B

Paper for yifanyu/I-DLM-8B