Add model card for Block-R1
#1
by nielsr HF Staff - opened
README.md
ADDED
|
@@ -0,0 +1,46 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
library_name: peft
|
| 4 |
+
base_model: GSAI-ML/LLaDA-8B-Instruct
|
| 5 |
+
pipeline_tag: text-generation
|
| 6 |
+
tags:
|
| 7 |
+
- reinforcement-learning
|
| 8 |
+
- diffusion-llm
|
| 9 |
+
- block-r1
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# Block-R1
|
| 13 |
+
|
| 14 |
+
This repository contains model checkpoints (LoRA adapters) for **Block-R1**, a benchmark for multi-domain reinforcement learning with block-based diffusion large language models (dLLMs).
|
| 15 |
+
|
| 16 |
+
## Description
|
| 17 |
+
|
| 18 |
+
Block-R1 is designed to enhance block-based reasoning generation in diffusion LLMs. It investigates the role of block size from a domain conflict perspective during reinforcement learning (RL) post-training. The benchmark covers diverse domains including code, mathematics, puzzles, and general knowledge.
|
| 19 |
+
|
| 20 |
+
Key components include:
|
| 21 |
+
- **Block-R1-41K Dataset:** A dataset constructed with optimized training block sizes for multi-domain RL.
|
| 22 |
+
- **b1 Method:** A dynamic-size reasoning block method for dLLMs.
|
| 23 |
+
- **RL Framework:** Support for multiple RL algorithms for diffusion models such as Diffusion-GRPO, WD1, GDPO, and more.
|
| 24 |
+
|
| 25 |
+
## Resources
|
| 26 |
+
|
| 27 |
+
- **Paper:** [Block-R1: Rethinking the Role of Block Size in Multi-domain Reinforcement Learning for Diffusion Large Language Models](https://huggingface.co/papers/2605.11726)
|
| 28 |
+
- **Code:** [GitHub Repository](https://github.com/YanJiangJerry/Block-R1)
|
| 29 |
+
- **Dataset:** [Block-R1 Dataset](https://huggingface.co/datasets/dLLM-R1/Block-R1)
|
| 30 |
+
|
| 31 |
+
## Model Information
|
| 32 |
+
|
| 33 |
+
These weights are LoRA adapters trained on top of the [LLaDA-8B-Instruct](https://huggingface.co/GSAI-ML/LLaDA-8B-Instruct) backbone. For detailed usage, training, and evaluation scripts, please refer to the official repository.
|
| 34 |
+
|
| 35 |
+
## Citation
|
| 36 |
+
|
| 37 |
+
If you use this benchmark or the associated methods, please cite the following work:
|
| 38 |
+
|
| 39 |
+
```bibtex
|
| 40 |
+
@article{jiang2026breakblock,
|
| 41 |
+
title={{Break the Block: Dynamic-size Reasoning Blocks for Diffusion Large Language Models via Monotonic Entropy Descent with Reinforcement Learning}},
|
| 42 |
+
author={Jiang, Yan and Qiu, Ruihong and Huang, Zi},
|
| 43 |
+
journal={arXiv preprint arXiv:2605.02263},
|
| 44 |
+
year={2026}
|
| 45 |
+
}
|
| 46 |
+
```
|