Add model card for Block-R1
Browse filesThis PR adds a model card for the Block-R1 project. It includes essential metadata such as the base model, library name (PEFT), license, and pipeline tag. The content provides links to the original paper, the source code on GitHub, and the associated dataset on Hugging Face.
README.md
ADDED
|
@@ -0,0 +1,46 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
library_name: peft
|
| 4 |
+
base_model: GSAI-ML/LLaDA-8B-Instruct
|
| 5 |
+
pipeline_tag: text-generation
|
| 6 |
+
tags:
|
| 7 |
+
- reinforcement-learning
|
| 8 |
+
- diffusion-llm
|
| 9 |
+
- block-r1
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# Block-R1
|
| 13 |
+
|
| 14 |
+
This repository contains model checkpoints (LoRA adapters) for **Block-R1**, a benchmark for multi-domain reinforcement learning with block-based diffusion large language models (dLLMs).
|
| 15 |
+
|
| 16 |
+
## Description
|
| 17 |
+
|
| 18 |
+
Block-R1 is designed to enhance block-based reasoning generation in diffusion LLMs. It investigates the role of block size from a domain conflict perspective during reinforcement learning (RL) post-training. The benchmark covers diverse domains including code, mathematics, puzzles, and general knowledge.
|
| 19 |
+
|
| 20 |
+
Key components include:
|
| 21 |
+
- **Block-R1-41K Dataset:** A dataset constructed with optimized training block sizes for multi-domain RL.
|
| 22 |
+
- **b1 Method:** A dynamic-size reasoning block method for dLLMs.
|
| 23 |
+
- **RL Framework:** Support for multiple RL algorithms for diffusion models such as Diffusion-GRPO, WD1, GDPO, and more.
|
| 24 |
+
|
| 25 |
+
## Resources
|
| 26 |
+
|
| 27 |
+
- **Paper:** [Block-R1: Rethinking the Role of Block Size in Multi-domain Reinforcement Learning for Diffusion Large Language Models](https://huggingface.co/papers/2605.11726)
|
| 28 |
+
- **Code:** [GitHub Repository](https://github.com/YanJiangJerry/Block-R1)
|
| 29 |
+
- **Dataset:** [Block-R1 Dataset](https://huggingface.co/datasets/dLLM-R1/Block-R1)
|
| 30 |
+
|
| 31 |
+
## Model Information
|
| 32 |
+
|
| 33 |
+
These weights are LoRA adapters trained on top of the [LLaDA-8B-Instruct](https://huggingface.co/GSAI-ML/LLaDA-8B-Instruct) backbone. For detailed usage, training, and evaluation scripts, please refer to the official repository.
|
| 34 |
+
|
| 35 |
+
## Citation
|
| 36 |
+
|
| 37 |
+
If you use this benchmark or the associated methods, please cite the following work:
|
| 38 |
+
|
| 39 |
+
```bibtex
|
| 40 |
+
@article{jiang2026breakblock,
|
| 41 |
+
title={{Break the Block: Dynamic-size Reasoning Blocks for Diffusion Large Language Models via Monotonic Entropy Descent with Reinforcement Learning}},
|
| 42 |
+
author={Jiang, Yan and Qiu, Ruihong and Huang, Zi},
|
| 43 |
+
journal={arXiv preprint arXiv:2605.02263},
|
| 44 |
+
year={2026}
|
| 45 |
+
}
|
| 46 |
+
```
|