Create README.md

60b3049 verified 29 days ago

6.11 kB

	---
	license: mit
	language:
	- en
	base_model:
	- Qwen/Qwen3-8B
	pipeline_tag: text-generation
	---
	<div align="center">

	<h1>
	TDAR-8B-Thinking
	</h1>

	<p><strong>Advancing Block Diffusion Language Models for Test-Time Scaling</strong></p>

	</div>

	<p align="center">
	📃 <a href="https://arxiv.org/abs/2602.09555" target="_blank">Paper</a> •
	💻 <a href="https://github.com/LuLuLuyi/TDAR" target="_blank">GitHub</a>
	</p>


	## Model Description

	TDAR-8B-Thinking is a state-of-the-art Block Diffusion Language Model (BDLM) designed for efficient test-time scaling on complex reasoning tasks. Built on Qwen3-8B architecture, it achieves 3.37× speedup over autoregressive baselines while maintaining superior reasoning quality.

	### Key Features


	- 🚀 Bounded Adaptive Confidence Decoding (BACD): Dynamically adapts denoising process based on local difficulty signals
	- 💡 Think Coarse, Critic Fine (TCCF): Allocates computation based on functional roles in reasoning trajectories
	- 📈 Progressive Block Size Extension: Trained with gradually increasing block sizes (B=4→64) for optimal efficiency



	## Basic Inference

	We use LMDeploy 0.10.2 with modifications for Bounded Adaptive Confidence Decoding support.

	Quick Installation (Inference Only):

	```bash
	git clone https://github.com/LuLuLuyi/TDAR.git
	cd TDAR

	# Install lmdeploy
	cd third_party/lmdeploy-0.10.2
	pip3 install -e .
	```

	> Note: This is a minimal setup for inference only. For full installation including training and evaluation dependencies, please refer to our comprehensive Installation Guide on [GitHub](https://github.com/LuLuLuyi/TDAR?tab=readme-ov-file#tdar).


	The following example shows how to quickly load the model and run inference end-to-end with BACD (Bounded Adaptive Confidence Decoding) for optimal speed-quality trade-off:

	```python
	from lmdeploy import pipeline, PytorchEngineConfig, GenerationConfig

	# Model path
	model_path = "lulululuyi/TDAR-8B-Thinking-bs8"

	# Configure engine with BACD (Bounded Adaptive Confidence Decoding)
	engine_config = PytorchEngineConfig(
	tp=1,
	dp=1,
	dtype="bfloat16",
	max_prefill_token_num=4096,
	cache_max_entry_count=0.8,
	enable_prefix_caching=True,
	session_len=8192,

	# BACD parameters
	dllm_block_length=8,
	dllm_denoising_steps=1,
	dllm_unmasking_strategy="bounded_adaptive_confidence_decoding",
	dllm_confidence_upper_threshold=0.9,
	dllm_confidence_lower_threshold=0.6
	)

	# Load model
	pipe = pipeline(model_path, backend_config=engine_config)

	# Prepare prompt
	question = "Write $\\frac{3}{20}$ as a decimal."
	prompt = f"""<\|im_start\|>user\n{question}Please reason step by step and put the final answer in \\boxed{{}}.\n<\|im_end\|>\n<\|im_start\|>assistant\n<think>"""

	# Generation config
	gen_config = GenerationConfig(
	top_k=0,
	temperature=1.0,
	top_p=1.0,
	do_sample=True,
	max_new_tokens=4096,
	ignore_eos=False,
	repetition_penalty=1.00
	)

	# Generate
	output = pipe([prompt], gen_config=gen_config)
	print(output[0].text)

	# Clean up
	pipe.close()
	```


	## Performance
	We comprehensively evaluate TDAR on 6 diverse reasoning benchmarks covering mathematical reasoning, code generation, and STEM tasks:

	\| Method \| Math500 \| \| AIME24 \| \| AIME25 \| \| AMC23 \| \| LCB \| \| GPQA \| \| AVG \| \|
	\|--------\|---------\|------\|--------\|------\|--------\|------\|-------\|------\|-----\|------\|------\|------\|---------\|------\|
	\| \| TPF \| ACC \| TPF \| AVG@8 \| TPF \| AVG@8 \| TPF \| AVG@8 \| TPF \| ACC \| TPF \| ACC \| TPF \| ACC \|
	\| Autoregressive LM \|
	\| Qwen3-8B-Thinking† \| 1.00 \| 88.2 \| 1.00 \| 63.3 \| 1.00 \| 55.8 \| 1.00 \| 88.8 \| 1.00 \| 59.5 \| 1.00 \| 49.0 \| 1.00 \| 67.4 \|
	\| Masked Diffusion LM \|
	\| LLaDA \| 3.91 \| 41.2 \| 3.44 \| 6.7 \| 3.66 \| 0.0 \| 4.07 \| 12.5 \| 2.83 \| 4.7 \| 3.14 \| 17.2 \| 3.51 \| 13.7 \|
	\| LLaDA-1.5 \| 3.97 \| 42.2 \| 3.34 \| 0.0 \| 3.68 \| 0.0 \| 4.01 \| 10.0 \| 2.86 \| 4.3 \| 3.01 \| 24.2 \| 3.48 \| 13.5 \|
	\| LLaDA-MoE \| 2.70 \| 56.6 \| 2.89 \| 3.3 \| 2.71 \| 0.0 \| 3.16 \| 32.5 \| 2.05 \| 12.9 \| 2.18 \| 27.8 \| 2.62 \| 22.2 \|
	\| Block Diffusion LM \|
	\| Fast-dLLM-v2 \| 2.81 \| 59.4 \| 2.58 \| 0.0 \| 2.58 \| 0.0 \| 2.77 \| 25.0 \| 1.73 \| 6.8 \| 2.09 \| 28.3 \| 2.43 \| 19.9 \|
	\| SDAR-8B-Chat \| 2.21 \| 52.6 \| 2.96 \| 5.0 \| 2.35 \| 7.1 \| 2.83 \| 22.5 \| 1.60 \| 7.5 \| 1.32 \| 10.6 \| 2.21 \| 17.6 \|
	\| DiRL-8B-Instruct \| 2.30 \| 78.2 \| 1.96 \| 18.8 \| 1.92 \| 15.8 \| 2.05 \| 65.6 \| 2.64 \| 10.4 \| 2.27 \| 44.4 \| 2.19 \| 38.9 \|
	\| TraDo-8B-Instruct \| 2.36 \| 75.0 \| 2.13 \| 13.3 \| 2.00 \| 12.5 \| 2.23 \| 55.3 \| 1.42 \| 7.2 \| 1.43 \| 27.3 \| 1.93 \| 31.8 \|
	\| TraDo-8B-Thinking \| 1.28 \| 84.0 \| 1.35 \| 31.3 \| 1.35 \| 26.3 \| 1.37 \| 72.8 \| 1.10 \| 22.6 \| 1.16 \| 46.0 \| 1.27 \| 47.1 \|
	\| TraDo + BACD \| 1.33 \| 85.0 \| 1.44 \| 32.9 \| 1.44 \| 27.5 \| 1.45 \| 73.8 \| 1.15 \| 23.3 \| 1.18 \| 49.5 \| 1.33 \| 48.7 \|
	\| TraDo + BACD + TCCF \| 1.28 \| 85.6 \| 1.36 \| 35.8 \| 1.33 \| 27.1 \| 1.36 \| 74.1 \| 1.11 \| 21.9 \| 1.14 \| 49.5 \| 1.27 \| 49.0 \|
	\| TDAR-8B-thinking (Ours) \| 1.62 \| 81.6 \| 4.47 \| 34.6 \| 4.17 \| 30.8 \| 5.03 \| 69.1 \| 1.25 \| 40.5 \| 1.28 \| 46.5 \| 2.97 \| 50.5 \|
	\| + BACD \| 1.88 \| 83.4 \| 5.07 \| 36.3 \| 4.73 \| 30.4 \| 5.59 \| 71.3 \| 1.46 \| 40.1 \| 1.49 \| 46.0 \| 3.37 \| 51.2 \|
	\| + BACD + TCCF \| 1.75 \| 84.0 \| 3.04 \| 42.9 \| 2.79 \| 35.8 \| 2.68 \| 80.0 \| 1.32 \| 42.6 \| 1.39 \| 50.0 \| 2.16 \| 55.9 \|

	> Note: TPF = Tokens Per Forward Pass (higher is faster); † indicates models derived from Qwen3-8B-Base with identical CPT and SFT.

	### Key Findings

	🏆 State-of-the-Art Performance
	- Achieves 55.9% average accuracy with BACD + TCCF (best among all 8B BDLMs)
	- Outperforms TraDo-8B-Thinking by +8.8 points while being 2.34× faster (2.97 TPF vs 1.27 TPF)
	- Strong results on challenging benchmarks: 42.9 on AIME24, 35.8 on AIME25, 80.0 on AMC23

	⚡ Superior Efficiency
	- 3.37× speedup with BACD alone (maximum efficiency, 51.2% accuracy)
	- 2.16× speedup with BACD + TCCF (best quality, 55.9% accuracy)