TDAR-8B-Thinking
Advancing Block Diffusion Language Models for Test-Time Scaling
Model Description
TDAR-8B-Thinking is a state-of-the-art Block Diffusion Language Model (BDLM) designed for efficient test-time scaling on complex reasoning tasks. Built on Qwen3-8B architecture, it achieves 3.37ร speedup over autoregressive baselines while maintaining superior reasoning quality.
Key Features
- ๐ Bounded Adaptive Confidence Decoding (BACD): Dynamically adapts denoising process based on local difficulty signals
- ๐ก Think Coarse, Critic Fine (TCCF): Allocates computation based on functional roles in reasoning trajectories
- ๐ Progressive Block Size Extension: Trained with gradually increasing block sizes (B=4โ64) for optimal efficiency
Basic Inference
We use LMDeploy 0.10.2 with modifications for Bounded Adaptive Confidence Decoding support.
Quick Installation (Inference Only):
git clone https://github.com/LuLuLuyi/TDAR.git
cd TDAR
# Install lmdeploy
cd third_party/lmdeploy-0.10.2
pip3 install -e .
Note: This is a minimal setup for inference only. For full installation including training and evaluation dependencies, please refer to our comprehensive Installation Guide on GitHub.
The following example shows how to quickly load the model and run inference end-to-end with BACD (Bounded Adaptive Confidence Decoding) for optimal speed-quality trade-off:
from lmdeploy import pipeline, PytorchEngineConfig, GenerationConfig
# Model path
model_path = "lulululuyi/TDAR-8B-Thinking-bs8"
# Configure engine with BACD (Bounded Adaptive Confidence Decoding)
engine_config = PytorchEngineConfig(
tp=1,
dp=1,
dtype="bfloat16",
max_prefill_token_num=4096,
cache_max_entry_count=0.8,
enable_prefix_caching=True,
session_len=8192,
# BACD parameters
dllm_block_length=8,
dllm_denoising_steps=1,
dllm_unmasking_strategy="bounded_adaptive_confidence_decoding",
dllm_confidence_upper_threshold=0.9,
dllm_confidence_lower_threshold=0.6
)
# Load model
pipe = pipeline(model_path, backend_config=engine_config)
# Prepare prompt
question = "Write $\\frac{3}{20}$ as a decimal."
prompt = f"""<|im_start|>user\n{question}Please reason step by step and put the final answer in \\boxed{{}}.\n<|im_end|>\n<|im_start|>assistant\n<think>"""
# Generation config
gen_config = GenerationConfig(
top_k=0,
temperature=1.0,
top_p=1.0,
do_sample=True,
max_new_tokens=4096,
ignore_eos=False,
repetition_penalty=1.00
)
# Generate
output = pipe([prompt], gen_config=gen_config)
print(output[0].text)
# Clean up
pipe.close()
Performance
We comprehensively evaluate TDAR on 6 diverse reasoning benchmarks covering mathematical reasoning, code generation, and STEM tasks:
| Method | Math500 | AIME24 | AIME25 | AMC23 | LCB | GPQA | AVG | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TPF | ACC | TPF | AVG@8 | TPF | AVG@8 | TPF | AVG@8 | TPF | ACC | TPF | ACC | TPF | ACC | |
| Autoregressive LM | ||||||||||||||
| Qwen3-8B-Thinkingโ | 1.00 | 88.2 | 1.00 | 63.3 | 1.00 | 55.8 | 1.00 | 88.8 | 1.00 | 59.5 | 1.00 | 49.0 | 1.00 | 67.4 |
| Masked Diffusion LM | ||||||||||||||
| LLaDA | 3.91 | 41.2 | 3.44 | 6.7 | 3.66 | 0.0 | 4.07 | 12.5 | 2.83 | 4.7 | 3.14 | 17.2 | 3.51 | 13.7 |
| LLaDA-1.5 | 3.97 | 42.2 | 3.34 | 0.0 | 3.68 | 0.0 | 4.01 | 10.0 | 2.86 | 4.3 | 3.01 | 24.2 | 3.48 | 13.5 |
| LLaDA-MoE | 2.70 | 56.6 | 2.89 | 3.3 | 2.71 | 0.0 | 3.16 | 32.5 | 2.05 | 12.9 | 2.18 | 27.8 | 2.62 | 22.2 |
| Block Diffusion LM | ||||||||||||||
| Fast-dLLM-v2 | 2.81 | 59.4 | 2.58 | 0.0 | 2.58 | 0.0 | 2.77 | 25.0 | 1.73 | 6.8 | 2.09 | 28.3 | 2.43 | 19.9 |
| SDAR-8B-Chat | 2.21 | 52.6 | 2.96 | 5.0 | 2.35 | 7.1 | 2.83 | 22.5 | 1.60 | 7.5 | 1.32 | 10.6 | 2.21 | 17.6 |
| DiRL-8B-Instruct | 2.30 | 78.2 | 1.96 | 18.8 | 1.92 | 15.8 | 2.05 | 65.6 | 2.64 | 10.4 | 2.27 | 44.4 | 2.19 | 38.9 |
| TraDo-8B-Instruct | 2.36 | 75.0 | 2.13 | 13.3 | 2.00 | 12.5 | 2.23 | 55.3 | 1.42 | 7.2 | 1.43 | 27.3 | 1.93 | 31.8 |
| TraDo-8B-Thinking | 1.28 | 84.0 | 1.35 | 31.3 | 1.35 | 26.3 | 1.37 | 72.8 | 1.10 | 22.6 | 1.16 | 46.0 | 1.27 | 47.1 |
| TraDo + BACD | 1.33 | 85.0 | 1.44 | 32.9 | 1.44 | 27.5 | 1.45 | 73.8 | 1.15 | 23.3 | 1.18 | 49.5 | 1.33 | 48.7 |
| TraDo + BACD + TCCF | 1.28 | 85.6 | 1.36 | 35.8 | 1.33 | 27.1 | 1.36 | 74.1 | 1.11 | 21.9 | 1.14 | 49.5 | 1.27 | 49.0 |
| TDAR-8B-thinking (Ours) | 1.62 | 81.6 | 4.47 | 34.6 | 4.17 | 30.8 | 5.03 | 69.1 | 1.25 | 40.5 | 1.28 | 46.5 | 2.97 | 50.5 |
| + BACD | 1.88 | 83.4 | 5.07 | 36.3 | 4.73 | 30.4 | 5.59 | 71.3 | 1.46 | 40.1 | 1.49 | 46.0 | 3.37 | 51.2 |
| + BACD + TCCF | 1.75 | 84.0 | 3.04 | 42.9 | 2.79 | 35.8 | 2.68 | 80.0 | 1.32 | 42.6 | 1.39 | 50.0 | 2.16 | 55.9 |
Note: TPF = Tokens Per Forward Pass (higher is faster); โ indicates models derived from Qwen3-8B-Base with identical CPT and SFT.
Key Findings
๐ State-of-the-Art Performance
- Achieves 55.9% average accuracy with BACD + TCCF (best among all 8B BDLMs)
- Outperforms TraDo-8B-Thinking by +8.8 points while being 2.34ร faster (2.97 TPF vs 1.27 TPF)
- Strong results on challenging benchmarks: 42.9 on AIME24, 35.8 on AIME25, 80.0 on AMC23
โก Superior Efficiency
- 3.37ร speedup with BACD alone (maximum efficiency, 51.2% accuracy)
- 2.16ร speedup with BACD + TCCF (best quality, 55.9% accuracy)
- Downloads last month
- 18