TDAR-8B-Thinking

Advancing Block Diffusion Language Models for Test-Time Scaling

Model Description

TDAR-8B-Thinking is a state-of-the-art Block Diffusion Language Model (BDLM) designed for efficient test-time scaling on complex reasoning tasks. Built on Qwen3-8B architecture, it achieves 3.37× speedup over autoregressive baselines while maintaining superior reasoning quality.

Key Features

🚀 Bounded Adaptive Confidence Decoding (BACD): Dynamically adapts denoising process based on local difficulty signals
💡 Think Coarse, Critic Fine (TCCF): Allocates computation based on functional roles in reasoning trajectories
📈 Progressive Block Size Extension: Trained with gradually increasing block sizes (B=4→64) for optimal efficiency

Basic Inference

We use LMDeploy 0.10.2 with modifications for Bounded Adaptive Confidence Decoding support.

Quick Installation (Inference Only):

git clone https://github.com/LuLuLuyi/TDAR.git
cd TDAR

# Install lmdeploy
cd third_party/lmdeploy-0.10.2
pip3 install -e .

Note: This is a minimal setup for inference only. For full installation including training and evaluation dependencies, please refer to our comprehensive Installation Guide on GitHub.

The following example shows how to quickly load the model and run inference end-to-end with BACD (Bounded Adaptive Confidence Decoding) for optimal speed-quality trade-off:

from lmdeploy import pipeline, PytorchEngineConfig, GenerationConfig

# Model path
model_path = "lulululuyi/TDAR-8B-Thinking-bs8"

# Configure engine with BACD (Bounded Adaptive Confidence Decoding)
engine_config = PytorchEngineConfig(
    tp=1,
    dp=1,
    dtype="bfloat16",
    max_prefill_token_num=4096,
    cache_max_entry_count=0.8,
    enable_prefix_caching=True,
    session_len=8192,
    
    # BACD parameters
    dllm_block_length=8,
    dllm_denoising_steps=1,
    dllm_unmasking_strategy="bounded_adaptive_confidence_decoding",
    dllm_confidence_upper_threshold=0.9,
    dllm_confidence_lower_threshold=0.6
)

# Load model
pipe = pipeline(model_path, backend_config=engine_config)

# Prepare prompt
question = "Write $\\frac{3}{20}$ as a decimal."
prompt = f"""<|im_start|>user\n{question}Please reason step by step and put the final answer in \\boxed{{}}.\n<|im_end|>\n<|im_start|>assistant\n<think>"""

# Generation config
gen_config = GenerationConfig(
    top_k=0,
    temperature=1.0,
    top_p=1.0,
    do_sample=True,
    max_new_tokens=4096,
    ignore_eos=False,
    repetition_penalty=1.00
)

# Generate
output = pipe([prompt], gen_config=gen_config)
print(output[0].text)

# Clean up
pipe.close()

Performance

We comprehensively evaluate TDAR on 6 diverse reasoning benchmarks covering mathematical reasoning, code generation, and STEM tasks:

Method	Math500		AIME24		AIME25		AMC23		LCB		GPQA		AVG
	TPF	ACC	TPF	AVG@8	TPF	AVG@8	TPF	AVG@8	TPF	ACC	TPF	ACC	TPF	ACC
Autoregressive LM
Qwen3-8B-Thinking†	1.00	88.2	1.00	63.3	1.00	55.8	1.00	88.8	1.00	59.5	1.00	49.0	1.00	67.4
Masked Diffusion LM
LLaDA	3.91	41.2	3.44	6.7	3.66	0.0	4.07	12.5	2.83	4.7	3.14	17.2	3.51	13.7
LLaDA-1.5	3.97	42.2	3.34	0.0	3.68	0.0	4.01	10.0	2.86	4.3	3.01	24.2	3.48	13.5
LLaDA-MoE	2.70	56.6	2.89	3.3	2.71	0.0	3.16	32.5	2.05	12.9	2.18	27.8	2.62	22.2
Block Diffusion LM
Fast-dLLM-v2	2.81	59.4	2.58	0.0	2.58	0.0	2.77	25.0	1.73	6.8	2.09	28.3	2.43	19.9
SDAR-8B-Chat	2.21	52.6	2.96	5.0	2.35	7.1	2.83	22.5	1.60	7.5	1.32	10.6	2.21	17.6
DiRL-8B-Instruct	2.30	78.2	1.96	18.8	1.92	15.8	2.05	65.6	2.64	10.4	2.27	44.4	2.19	38.9
TraDo-8B-Instruct	2.36	75.0	2.13	13.3	2.00	12.5	2.23	55.3	1.42	7.2	1.43	27.3	1.93	31.8
TraDo-8B-Thinking	1.28	84.0	1.35	31.3	1.35	26.3	1.37	72.8	1.10	22.6	1.16	46.0	1.27	47.1
TraDo + BACD	1.33	85.0	1.44	32.9	1.44	27.5	1.45	73.8	1.15	23.3	1.18	49.5	1.33	48.7
TraDo + BACD + TCCF	1.28	85.6	1.36	35.8	1.33	27.1	1.36	74.1	1.11	21.9	1.14	49.5	1.27	49.0
TDAR-8B-thinking (Ours)	1.62	81.6	4.47	34.6	4.17	30.8	5.03	69.1	1.25	40.5	1.28	46.5	2.97	50.5
+ BACD	1.88	83.4	5.07	36.3	4.73	30.4	5.59	71.3	1.46	40.1	1.49	46.0	3.37	51.2
+ BACD + TCCF	1.75	84.0	3.04	42.9	2.79	35.8	2.68	80.0	1.32	42.6	1.39	50.0	2.16	55.9

Note: TPF = Tokens Per Forward Pass (higher is faster); † indicates models derived from Qwen3-8B-Base with identical CPT and SFT.

Key Findings

🏆 State-of-the-Art Performance

Achieves 55.9% average accuracy with BACD + TCCF (best among all 8B BDLMs)
Outperforms TraDo-8B-Thinking by +8.8 points while being 2.34× faster (2.97 TPF vs 1.27 TPF)
Strong results on challenging benchmarks: 42.9 on AIME24, 35.8 on AIME25, 80.0 on AMC23

⚡ Superior Efficiency