Block-R1-ckpts / README.md
nielsr's picture
nielsr HF Staff
Add model card for Block-R1
54f0d59 verified
|
raw
history blame
2.03 kB
metadata
license: apache-2.0
library_name: peft
base_model: GSAI-ML/LLaDA-8B-Instruct
pipeline_tag: text-generation
tags:
  - reinforcement-learning
  - diffusion-llm
  - block-r1

Block-R1

This repository contains model checkpoints (LoRA adapters) for Block-R1, a benchmark for multi-domain reinforcement learning with block-based diffusion large language models (dLLMs).

Description

Block-R1 is designed to enhance block-based reasoning generation in diffusion LLMs. It investigates the role of block size from a domain conflict perspective during reinforcement learning (RL) post-training. The benchmark covers diverse domains including code, mathematics, puzzles, and general knowledge.

Key components include:

  • Block-R1-41K Dataset: A dataset constructed with optimized training block sizes for multi-domain RL.
  • b1 Method: A dynamic-size reasoning block method for dLLMs.
  • RL Framework: Support for multiple RL algorithms for diffusion models such as Diffusion-GRPO, WD1, GDPO, and more.

Resources

Model Information

These weights are LoRA adapters trained on top of the LLaDA-8B-Instruct backbone. For detailed usage, training, and evaluation scripts, please refer to the official repository.

Citation

If you use this benchmark or the associated methods, please cite the following work:

@article{jiang2026breakblock,
  title={{Break the Block: Dynamic-size Reasoning Blocks for Diffusion Large Language Models via Monotonic Entropy Descent with Reinforcement Learning}},
  author={Jiang, Yan and Qiu, Ruihong and Huang, Zi},
  journal={arXiv preprint arXiv:2605.02263},
  year={2026}
}