I-DLM-32B

Introspective Diffusion Language Model (32B) — a diffusion language model converted from Qwen3-32B that matches AR quality while enabling parallel token generation.

[Project Page] [Paper] [Code]

Highlights

  • Matches Qwen3-32B quality across 15 benchmarks (knowledge, math, code, instruction following)
  • Introspective Strided Decoding (ISD): single-pass generation + verification with p/q acceptance criterion
  • AR-compatible serving via SGLang (paged KV cache, continuous batching, CUDA graphs)

Results

Quality (I-DLM-32B vs baselines)

Benchmark I-DLM-32B Qwen3-32B (AR) LLaDA-2.1-flash (100B)
ARC-C 97.0 96.8 91.0
MMLU 85.8 86.0 72.4
MMLU-Pro 79.3 79.8 -
GPQA-D 68.2 68.7 49.5
GSM8K 97.3 97.3 94.5
MATH-500 96.8 96.6 82.4
AIME-24 85.4 85.0 46.7
AIME-25 72.7 72.0 -
MathBench 93.3 93.5 -
HumanEval 95.7 95.7 81.1
MBPP 93.7 93.7 -
LiveCodeBench-v6 57.2 57.7 39.3
IFEval 87.1 87.1 83.0

Usage

This model uses a custom architecture (SDARForCausalLM) and requires trust_remote_code=True.

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "yifanyu/I-DLM-32B",
    trust_remote_code=True,
    torch_dtype="auto",
)
tokenizer = AutoTokenizer.from_pretrained("yifanyu/I-DLM-32B")

For training code and ISD inference, see the GitHub repo.

Method

I-DLM recovers introspective consistency (AR models' inherent self-agreement) through:

  1. Strict causal masking across both masked and clean tokens
  2. Logit shift (Dream shift): hidden state at position i predicts token i+1
  3. All-masked training: CE loss on both noisy and clean token positions

Training loss: L = CE_noisy + alpha * CE_clean

Related Models

Model HuggingFace Description
I-DLM-8B yifanyu/I-DLM-8B Converted from Qwen3-8B
I-DLM-32B yifanyu/I-DLM-32B Converted from Qwen3-32B
I-DLM-8B-LoRA yifanyu/I-DLM-8B-lora-r128 Gated LoRA adapter (rank=128) for lossless R-ISD

Citation

@article{yu2026introspective,
  title={Introspective Diffusion Language Models},
  author={Yu, Yifan and Jian, Yuqing and Wang, Junxiong and Zhou, Zhongzhu
          and Zhuang, Donglin and Fang, Xinyu and Yanamandra, Sri
          and Wu, Xiaoxia and Wu, Qingyang and Song, Shuaiwen Leon
          and Dao, Tri and Athiwaratkun, Ben and Zou, James
          and Lai, Fan and Xu, Chenfeng},
  journal={arXiv preprint arXiv:2604.11035},
  year={2026}
}
Downloads last month
404
Safetensors
Model size
33B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yifanyu/I-DLM-32B

Base model

Qwen/Qwen3-32B
Finetuned
(458)
this model

Collection including yifanyu/I-DLM-32B

Paper for yifanyu/I-DLM-32B