YanJiangJerry
/

Block-R1-ckpts

Model card Files Files and versions

Block-R1-ckpts / README.md

nielsr's picture

nielsr HF Staff

Add model card for Block-R1

54f0d59 verified 1 day ago

|

2.03 kB

	---
	license: apache-2.0
	library_name: peft
	base_model: GSAI-ML/LLaDA-8B-Instruct
	pipeline_tag: text-generation
	tags:
	- reinforcement-learning
	- diffusion-llm
	- block-r1
	---

	# Block-R1

	This repository contains model checkpoints (LoRA adapters) for Block-R1, a benchmark for multi-domain reinforcement learning with block-based diffusion large language models (dLLMs).

	## Description

	Block-R1 is designed to enhance block-based reasoning generation in diffusion LLMs. It investigates the role of block size from a domain conflict perspective during reinforcement learning (RL) post-training. The benchmark covers diverse domains including code, mathematics, puzzles, and general knowledge.

	Key components include:
	- Block-R1-41K Dataset: A dataset constructed with optimized training block sizes for multi-domain RL.
	- b1 Method: A dynamic-size reasoning block method for dLLMs.
	- RL Framework: Support for multiple RL algorithms for diffusion models such as Diffusion-GRPO, WD1, GDPO, and more.

	## Resources

	- Paper: [Block-R1: Rethinking the Role of Block Size in Multi-domain Reinforcement Learning for Diffusion Large Language Models](https://huggingface.co/papers/2605.11726)
	- Code: [GitHub Repository](https://github.com/YanJiangJerry/Block-R1)
	- Dataset: [Block-R1 Dataset](https://huggingface.co/datasets/dLLM-R1/Block-R1)

	## Model Information

	These weights are LoRA adapters trained on top of the [LLaDA-8B-Instruct](https://huggingface.co/GSAI-ML/LLaDA-8B-Instruct) backbone. For detailed usage, training, and evaluation scripts, please refer to the official repository.

	## Citation

	If you use this benchmark or the associated methods, please cite the following work:

	```bibtex
	@article{jiang2026breakblock,
	title={{Break the Block: Dynamic-size Reasoning Blocks for Diffusion Large Language Models via Monotonic Entropy Descent with Reinforcement Learning}},
	author={Jiang, Yan and Qiu, Ruihong and Huang, Zi},
	journal={arXiv preprint arXiv:2605.02263},
	year={2026}
	}
	```