File size: 6,112 Bytes
60b3049 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 | ---
license: mit
language:
- en
base_model:
- Qwen/Qwen3-8B
pipeline_tag: text-generation
---
<div align="center">
<h1>
TDAR-8B-Thinking
</h1>
<p><strong>Advancing Block Diffusion Language Models for Test-Time Scaling</strong></p>
</div>
<p align="center">
📃 <a href="https://arxiv.org/abs/2602.09555" target="_blank">Paper</a> •
💻 <a href="https://github.com/LuLuLuyi/TDAR" target="_blank">GitHub</a>
</p>
## Model Description
**TDAR-8B-Thinking** is a state-of-the-art Block Diffusion Language Model (BDLM) designed for efficient test-time scaling on complex reasoning tasks. Built on Qwen3-8B architecture, it achieves **3.37× speedup** over autoregressive baselines while maintaining superior reasoning quality.
### Key Features
- 🚀 **Bounded Adaptive Confidence Decoding (BACD)**: Dynamically adapts denoising process based on local difficulty signals
- 💡 **Think Coarse, Critic Fine (TCCF)**: Allocates computation based on functional roles in reasoning trajectories
- 📈 **Progressive Block Size Extension**: Trained with gradually increasing block sizes (B=4→64) for optimal efficiency
## Basic Inference
We use **LMDeploy 0.10.2** with modifications for Bounded Adaptive Confidence Decoding support.
**Quick Installation (Inference Only):**
```bash
git clone https://github.com/LuLuLuyi/TDAR.git
cd TDAR
# Install lmdeploy
cd third_party/lmdeploy-0.10.2
pip3 install -e .
```
> **Note**: This is a minimal setup for inference only. For full installation including training and evaluation dependencies, please refer to our comprehensive Installation Guide on [GitHub](https://github.com/LuLuLuyi/TDAR?tab=readme-ov-file#tdar).
The following example shows how to quickly load the model and run inference end-to-end with BACD (Bounded Adaptive Confidence Decoding) for optimal speed-quality trade-off:
```python
from lmdeploy import pipeline, PytorchEngineConfig, GenerationConfig
# Model path
model_path = "lulululuyi/TDAR-8B-Thinking-bs8"
# Configure engine with BACD (Bounded Adaptive Confidence Decoding)
engine_config = PytorchEngineConfig(
tp=1,
dp=1,
dtype="bfloat16",
max_prefill_token_num=4096,
cache_max_entry_count=0.8,
enable_prefix_caching=True,
session_len=8192,
# BACD parameters
dllm_block_length=8,
dllm_denoising_steps=1,
dllm_unmasking_strategy="bounded_adaptive_confidence_decoding",
dllm_confidence_upper_threshold=0.9,
dllm_confidence_lower_threshold=0.6
)
# Load model
pipe = pipeline(model_path, backend_config=engine_config)
# Prepare prompt
question = "Write $\\frac{3}{20}$ as a decimal."
prompt = f"""<|im_start|>user\n{question}Please reason step by step and put the final answer in \\boxed{{}}.\n<|im_end|>\n<|im_start|>assistant\n<think>"""
# Generation config
gen_config = GenerationConfig(
top_k=0,
temperature=1.0,
top_p=1.0,
do_sample=True,
max_new_tokens=4096,
ignore_eos=False,
repetition_penalty=1.00
)
# Generate
output = pipe([prompt], gen_config=gen_config)
print(output[0].text)
# Clean up
pipe.close()
```
## Performance
We comprehensively evaluate TDAR on 6 diverse reasoning benchmarks covering mathematical reasoning, code generation, and STEM tasks:
| Method | **Math500** | | **AIME24** | | **AIME25** | | **AMC23** | | **LCB** | | **GPQA** | | **AVG** | |
|--------|---------|------|--------|------|--------|------|-------|------|-----|------|------|------|---------|------|
| | TPF | ACC | TPF | AVG@8 | TPF | AVG@8 | TPF | AVG@8 | TPF | ACC | TPF | ACC | TPF | ACC |
| **Autoregressive LM** |
| Qwen3-8B-Thinking† | 1.00 | 88.2 | 1.00 | 63.3 | 1.00 | 55.8 | 1.00 | 88.8 | 1.00 | 59.5 | 1.00 | 49.0 | 1.00 | 67.4 |
| **Masked Diffusion LM** |
| LLaDA | 3.91 | 41.2 | 3.44 | 6.7 | 3.66 | 0.0 | 4.07 | 12.5 | 2.83 | 4.7 | 3.14 | 17.2 | 3.51 | 13.7 |
| LLaDA-1.5 | 3.97 | 42.2 | 3.34 | 0.0 | 3.68 | 0.0 | 4.01 | 10.0 | 2.86 | 4.3 | 3.01 | 24.2 | 3.48 | 13.5 |
| LLaDA-MoE | 2.70 | 56.6 | 2.89 | 3.3 | 2.71 | 0.0 | 3.16 | 32.5 | 2.05 | 12.9 | 2.18 | 27.8 | 2.62 | 22.2 |
| **Block Diffusion LM** |
| Fast-dLLM-v2 | 2.81 | 59.4 | 2.58 | 0.0 | 2.58 | 0.0 | 2.77 | 25.0 | 1.73 | 6.8 | 2.09 | 28.3 | 2.43 | 19.9 |
| SDAR-8B-Chat | 2.21 | 52.6 | 2.96 | 5.0 | 2.35 | 7.1 | 2.83 | 22.5 | 1.60 | 7.5 | 1.32 | 10.6 | 2.21 | 17.6 |
| DiRL-8B-Instruct | 2.30 | 78.2 | 1.96 | 18.8 | 1.92 | 15.8 | 2.05 | 65.6 | 2.64 | 10.4 | 2.27 | 44.4 | 2.19 | 38.9 |
| TraDo-8B-Instruct | 2.36 | 75.0 | 2.13 | 13.3 | 2.00 | 12.5 | 2.23 | 55.3 | 1.42 | 7.2 | 1.43 | 27.3 | 1.93 | 31.8 |
| TraDo-8B-Thinking | 1.28 | 84.0 | 1.35 | 31.3 | 1.35 | 26.3 | 1.37 | 72.8 | 1.10 | 22.6 | 1.16 | 46.0 | 1.27 | 47.1 |
| TraDo + BACD | 1.33 | 85.0 | 1.44 | 32.9 | 1.44 | 27.5 | 1.45 | 73.8 | 1.15 | 23.3 | 1.18 | 49.5 | 1.33 | 48.7 |
| TraDo + BACD + TCCF | 1.28 | 85.6 | 1.36 | 35.8 | 1.33 | 27.1 | 1.36 | 74.1 | 1.11 | 21.9 | 1.14 | 49.5 | 1.27 | 49.0 |
| **TDAR-8B-thinking (Ours)** | **1.62** | **81.6** | **4.47** | **34.6** | **4.17** | **30.8** | **5.03** | **69.1** | **1.25** | **40.5** | **1.28** | **46.5** | **2.97** | **50.5** |
| **+ BACD** | **1.88** | **83.4** | **5.07** | **36.3** | **4.73** | **30.4** | **5.59** | **71.3** | **1.46** | **40.1** | **1.49** | **46.0** | **3.37** | **51.2** |
| **+ BACD + TCCF** | **1.75** | **84.0** | **3.04** | **42.9** | **2.79** | **35.8** | **2.68** | **80.0** | **1.32** | **42.6** | **1.39** | **50.0** | **2.16** | **55.9** |
> **Note:** TPF = Tokens Per Forward Pass (higher is faster); † indicates models derived from Qwen3-8B-Base with identical CPT and SFT.
### Key Findings
**🏆 State-of-the-Art Performance**
- Achieves **55.9%** average accuracy with BACD + TCCF (best among all 8B BDLMs)
- Outperforms TraDo-8B-Thinking by **+8.8 points** while being **2.34× faster** (2.97 TPF vs 1.27 TPF)
- Strong results on challenging benchmarks: **42.9** on AIME24, **35.8** on AIME25, **80.0** on AMC23
**⚡ Superior Efficiency**
- **3.37× speedup** with BACD alone (maximum efficiency, 51.2% accuracy)
- **2.16× speedup** with BACD + TCCF (best quality, 55.9% accuracy) |