TDAR-8B-Thinking

Advancing Block Diffusion Language Models for Test-Time Scaling

๐Ÿ“ƒ Paper โ€ข ๐Ÿ’ป GitHub

Model Description

TDAR-8B-Thinking is a state-of-the-art Block Diffusion Language Model (BDLM) designed for efficient test-time scaling on complex reasoning tasks. Built on Qwen3-8B architecture, it achieves 3.37ร— speedup over autoregressive baselines while maintaining superior reasoning quality.

Key Features

  • ๐Ÿš€ Bounded Adaptive Confidence Decoding (BACD): Dynamically adapts denoising process based on local difficulty signals
  • ๐Ÿ’ก Think Coarse, Critic Fine (TCCF): Allocates computation based on functional roles in reasoning trajectories
  • ๐Ÿ“ˆ Progressive Block Size Extension: Trained with gradually increasing block sizes (B=4โ†’64) for optimal efficiency

Basic Inference

We use LMDeploy 0.10.2 with modifications for Bounded Adaptive Confidence Decoding support.

Quick Installation (Inference Only):

git clone https://github.com/LuLuLuyi/TDAR.git
cd TDAR

# Install lmdeploy
cd third_party/lmdeploy-0.10.2
pip3 install -e .

Note: This is a minimal setup for inference only. For full installation including training and evaluation dependencies, please refer to our comprehensive Installation Guide on GitHub.

The following example shows how to quickly load the model and run inference end-to-end with BACD (Bounded Adaptive Confidence Decoding) for optimal speed-quality trade-off:

from lmdeploy import pipeline, PytorchEngineConfig, GenerationConfig

# Model path
model_path = "lulululuyi/TDAR-8B-Thinking"

# Configure engine with BACD (Bounded Adaptive Confidence Decoding)
engine_config = PytorchEngineConfig(
    tp=1,
    dp=1,
    dtype="bfloat16",
    max_prefill_token_num=4096,
    cache_max_entry_count=0.8,
    enable_prefix_caching=True,
    session_len=8192,
    
    # BACD parameters
    dllm_block_length=16,
    dllm_denoising_steps=1,
    dllm_unmasking_strategy="bounded_adaptive_confidence_decoding",
    dllm_confidence_upper_threshold=0.9,
    dllm_confidence_lower_threshold=0.6
)

# Load model
pipe = pipeline(model_path, backend_config=engine_config)

# Prepare prompt
question = "Write $\\frac{3}{20}$ as a decimal."
prompt = f"""<|im_start|>user\n{question}Please reason step by step and put the final answer in \\boxed{{}}.\n<|im_end|>\n<|im_start|>assistant\n<think>"""

# Generation config
gen_config = GenerationConfig(
    top_k=0,
    temperature=1.0,
    top_p=1.0,
    do_sample=True,
    max_new_tokens=4096,
    ignore_eos=False,
    repetition_penalty=1.00
)

# Generate
output = pipe([prompt], gen_config=gen_config)
print(output[0].text)

# Clean up
pipe.close()

Performance

We comprehensively evaluate TDAR on 6 diverse reasoning benchmarks covering mathematical reasoning, code generation, and STEM tasks:

Method Math500 AIME24 AIME25 AMC23 LCB GPQA AVG
TPF ACC TPF AVG@8 TPF AVG@8 TPF AVG@8 TPF ACC TPF ACC TPF ACC
Autoregressive LM
Qwen3-8B-Thinkingโ€  1.00 88.2 1.00 63.3 1.00 55.8 1.00 88.8 1.00 59.5 1.00 49.0 1.00 67.4
Masked Diffusion LM
LLaDA 3.91 41.2 3.44 6.7 3.66 0.0 4.07 12.5 2.83 4.7 3.14 17.2 3.51 13.7
LLaDA-1.5 3.97 42.2 3.34 0.0 3.68 0.0 4.01 10.0 2.86 4.3 3.01 24.2 3.48 13.5
LLaDA-MoE 2.70 56.6 2.89 3.3 2.71 0.0 3.16 32.5 2.05 12.9 2.18 27.8 2.62 22.2
Block Diffusion LM
Fast-dLLM-v2 2.81 59.4 2.58 0.0 2.58 0.0 2.77 25.0 1.73 6.8 2.09 28.3 2.43 19.9
SDAR-8B-Chat 2.21 52.6 2.96 5.0 2.35 7.1 2.83 22.5 1.60 7.5 1.32 10.6 2.21 17.6
DiRL-8B-Instruct 2.30 78.2 1.96 18.8 1.92 15.8 2.05 65.6 2.64 10.4 2.27 44.4 2.19 38.9
TraDo-8B-Instruct 2.36 75.0 2.13 13.3 2.00 12.5 2.23 55.3 1.42 7.2 1.43 27.3 1.93 31.8
TraDo-8B-Thinking 1.28 84.0 1.35 31.3 1.35 26.3 1.37 72.8 1.10 22.6 1.16 46.0 1.27 47.1
TraDo + BACD 1.33 85.0 1.44 32.9 1.44 27.5 1.45 73.8 1.15 23.3 1.18 49.5 1.33 48.7
TraDo + BACD + TCCF 1.28 85.6 1.36 35.8 1.33 27.1 1.36 74.1 1.11 21.9 1.14 49.5 1.27 49.0
TDAR-8B-thinking (Ours) 1.62 81.6 4.47 34.6 4.17 30.8 5.03 69.1 1.25 40.5 1.28 46.5 2.97 50.5
+ BACD 1.88 83.4 5.07 36.3 4.73 30.4 5.59 71.3 1.46 40.1 1.49 46.0 3.37 51.2
+ BACD + TCCF 1.75 84.0 3.04 42.9 2.79 35.8 2.68 80.0 1.32 42.6 1.39 50.0 2.16 55.9

Note: TPF = Tokens Per Forward Pass (higher is faster); โ€  indicates models derived from Qwen3-8B-Base with identical CPT and SFT.

Key Findings

๐Ÿ† State-of-the-Art Performance

  • Achieves 55.9% average accuracy with BACD + TCCF (best among all 8B BDLMs)
  • Outperforms TraDo-8B-Thinking by +8.8 points while being 2.34ร— faster (2.97 TPF vs 1.27 TPF)
  • Strong results on challenging benchmarks: 42.9 on AIME24, 35.8 on AIME25, 80.0 on AMC23

โšก Superior Efficiency

  • 3.37ร— speedup with BACD alone (maximum efficiency, 51.2% accuracy)
  • 2.16ร— speedup with BACD + TCCF (best quality, 55.9% accuracy)
Downloads last month
-
Safetensors
Model size
8B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for lulululuyi/TDAR-8B-Thinking

Base model

Qwen/Qwen3-8B-Base
Finetuned
Qwen/Qwen3-8B
Finetuned
(939)
this model

Collection including lulululuyi/TDAR-8B-Thinking

Paper for lulululuyi/TDAR-8B-Thinking