File size: 6,112 Bytes
60b3049
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
---
license: mit
language:
- en
base_model:
- Qwen/Qwen3-8B
pipeline_tag: text-generation
---
<div align="center">

<h1>
  TDAR-8B-Thinking
</h1>

<p><strong>Advancing Block Diffusion Language Models for Test-Time Scaling</strong></p>

</div>

<p align="center">
  📃 <a href="https://arxiv.org/abs/2602.09555" target="_blank">Paper</a> • 
  💻 <a href="https://github.com/LuLuLuyi/TDAR" target="_blank">GitHub</a>
</p>


## Model Description

**TDAR-8B-Thinking** is a state-of-the-art Block Diffusion Language Model (BDLM) designed for efficient test-time scaling on complex reasoning tasks. Built on Qwen3-8B architecture, it achieves **3.37× speedup** over autoregressive baselines while maintaining superior reasoning quality.

### Key Features


- 🚀 **Bounded Adaptive Confidence Decoding (BACD)**: Dynamically adapts denoising process based on local difficulty signals
- 💡 **Think Coarse, Critic Fine (TCCF)**: Allocates computation based on functional roles in reasoning trajectories
- 📈 **Progressive Block Size Extension**: Trained with gradually increasing block sizes (B=4→64) for optimal efficiency



## Basic Inference

We use **LMDeploy 0.10.2** with modifications for Bounded Adaptive Confidence Decoding support.

**Quick Installation (Inference Only):**

```bash
git clone https://github.com/LuLuLuyi/TDAR.git
cd TDAR

# Install lmdeploy
cd third_party/lmdeploy-0.10.2
pip3 install -e .
```

> **Note**: This is a minimal setup for inference only. For full installation including training and evaluation dependencies, please refer to our comprehensive Installation Guide on [GitHub](https://github.com/LuLuLuyi/TDAR?tab=readme-ov-file#tdar).


The following example shows how to quickly load the model and run inference end-to-end with BACD (Bounded Adaptive Confidence Decoding) for optimal speed-quality trade-off:

```python
from lmdeploy import pipeline, PytorchEngineConfig, GenerationConfig

# Model path
model_path = "lulululuyi/TDAR-8B-Thinking-bs8"

# Configure engine with BACD (Bounded Adaptive Confidence Decoding)
engine_config = PytorchEngineConfig(
    tp=1,
    dp=1,
    dtype="bfloat16",
    max_prefill_token_num=4096,
    cache_max_entry_count=0.8,
    enable_prefix_caching=True,
    session_len=8192,
    
    # BACD parameters
    dllm_block_length=8,
    dllm_denoising_steps=1,
    dllm_unmasking_strategy="bounded_adaptive_confidence_decoding",
    dllm_confidence_upper_threshold=0.9,
    dllm_confidence_lower_threshold=0.6
)

# Load model
pipe = pipeline(model_path, backend_config=engine_config)

# Prepare prompt
question = "Write $\\frac{3}{20}$ as a decimal."
prompt = f"""<|im_start|>user\n{question}Please reason step by step and put the final answer in \\boxed{{}}.\n<|im_end|>\n<|im_start|>assistant\n<think>"""

# Generation config
gen_config = GenerationConfig(
    top_k=0,
    temperature=1.0,
    top_p=1.0,
    do_sample=True,
    max_new_tokens=4096,
    ignore_eos=False,
    repetition_penalty=1.00
)

# Generate
output = pipe([prompt], gen_config=gen_config)
print(output[0].text)

# Clean up
pipe.close()
```


## Performance
We comprehensively evaluate TDAR on 6 diverse reasoning benchmarks covering mathematical reasoning, code generation, and STEM tasks:

| Method | **Math500** |  | **AIME24** |  | **AIME25** |  | **AMC23** |  | **LCB** |  | **GPQA** |  | **AVG** |  |
|--------|---------|------|--------|------|--------|------|-------|------|-----|------|------|------|---------|------|
|        | TPF | ACC | TPF | AVG@8 | TPF | AVG@8 | TPF | AVG@8 | TPF | ACC | TPF | ACC | TPF | ACC |
| **Autoregressive LM** |
| Qwen3-8B-Thinking† | 1.00 | 88.2 | 1.00 | 63.3 | 1.00 | 55.8 | 1.00 | 88.8 | 1.00 | 59.5 | 1.00 | 49.0 | 1.00 | 67.4 |
| **Masked Diffusion LM** |
| LLaDA | 3.91 | 41.2 | 3.44 | 6.7 | 3.66 | 0.0 | 4.07 | 12.5 | 2.83 | 4.7 | 3.14 | 17.2 | 3.51 | 13.7 |
| LLaDA-1.5 | 3.97 | 42.2 | 3.34 | 0.0 | 3.68 | 0.0 | 4.01 | 10.0 | 2.86 | 4.3 | 3.01 | 24.2 | 3.48 | 13.5 |
| LLaDA-MoE | 2.70 | 56.6 | 2.89 | 3.3 | 2.71 | 0.0 | 3.16 | 32.5 | 2.05 | 12.9 | 2.18 | 27.8 | 2.62 | 22.2 |
| **Block Diffusion LM** |
| Fast-dLLM-v2 | 2.81 | 59.4 | 2.58 | 0.0 | 2.58 | 0.0 | 2.77 | 25.0 | 1.73 | 6.8 | 2.09 | 28.3 | 2.43 | 19.9 |
| SDAR-8B-Chat | 2.21 | 52.6 | 2.96 | 5.0 | 2.35 | 7.1 | 2.83 | 22.5 | 1.60 | 7.5 | 1.32 | 10.6 | 2.21 | 17.6 |
| DiRL-8B-Instruct | 2.30 | 78.2 | 1.96 | 18.8 | 1.92 | 15.8 | 2.05 | 65.6 | 2.64 | 10.4 | 2.27 | 44.4 | 2.19 | 38.9 |
| TraDo-8B-Instruct | 2.36 | 75.0 | 2.13 | 13.3 | 2.00 | 12.5 | 2.23 | 55.3 | 1.42 | 7.2 | 1.43 | 27.3 | 1.93 | 31.8 |
| TraDo-8B-Thinking | 1.28 | 84.0 | 1.35 | 31.3 | 1.35 | 26.3 | 1.37 | 72.8 | 1.10 | 22.6 | 1.16 | 46.0 | 1.27 | 47.1 |
| TraDo + BACD | 1.33 | 85.0 | 1.44 | 32.9 | 1.44 | 27.5 | 1.45 | 73.8 | 1.15 | 23.3 | 1.18 | 49.5 | 1.33 | 48.7 |
| TraDo + BACD + TCCF | 1.28 | 85.6 | 1.36 | 35.8 | 1.33 | 27.1 | 1.36 | 74.1 | 1.11 | 21.9 | 1.14 | 49.5 | 1.27 | 49.0 |
| **TDAR-8B-thinking (Ours)** | **1.62** | **81.6** | **4.47** | **34.6** | **4.17** | **30.8** | **5.03** | **69.1** | **1.25** | **40.5** | **1.28** | **46.5** | **2.97** | **50.5** |
| **+ BACD** | **1.88** | **83.4** | **5.07** | **36.3** | **4.73** | **30.4** | **5.59** | **71.3** | **1.46** | **40.1** | **1.49** | **46.0** | **3.37** | **51.2** |
| **+ BACD + TCCF** | **1.75** | **84.0** | **3.04** | **42.9** | **2.79** | **35.8** | **2.68** | **80.0** | **1.32** | **42.6** | **1.39** | **50.0** | **2.16** | **55.9** |

> **Note:** TPF = Tokens Per Forward Pass (higher is faster); † indicates models derived from Qwen3-8B-Base with identical CPT and SFT.

### Key Findings

**🏆 State-of-the-Art Performance**
- Achieves **55.9%** average accuracy with BACD + TCCF (best among all 8B BDLMs)
- Outperforms TraDo-8B-Thinking by **+8.8 points** while being **2.34× faster** (2.97 TPF vs 1.27 TPF)
- Strong results on challenging benchmarks: **42.9** on AIME24, **35.8** on AIME25, **80.0** on AMC23

**⚡ Superior Efficiency**
- **3.37× speedup** with BACD alone (maximum efficiency, 51.2% accuracy)
- **2.16× speedup** with BACD + TCCF (best quality, 55.9% accuracy)