THU-KEG
/

LLaDA-8B-BGPO-code

Reinforcement Learning

code-generation

Model card Files Files and versions

linny2002 commited on Oct 11, 2025

Commit

20b81f6

·

verified ·

1 Parent(s): 08c2d6b

Create README.md

Files changed (1) hide show

README.md +52 -0

README.md ADDED Viewed

	@@ -0,0 +1,52 @@

+---
+license: apache-2.0
+language:
+- en
+tags:
+- reinforcement-learning
+- code-generation
+- dllm
+- bgpo
+- llada
+size_categories:
+- 8B
+base_model:
+- GSAI-ML/LLaDA-8B-Instruct
+---
+# LLaDA-8B-BGPO-code
+[![Paper](https://img.shields.io/badge/Paper-arXiv:-red)]()
+[![Code](https://img.shields.io/badge/Code-GitHub-blue)](https://github.com/THU-KEG/BGPO)
+## Model Description
+**LLaDA-8B-BGPO-code** is an 8-billion parameter diffusion large language model (dLLM) that was trained on LLaDA-8B-Instruct using Boundary-Guided Policy Optimization (BGPO) for enhanced code generation capabilities.
+## Model Details
+- **Model Type**: Diffusion Large Language Model (dLLM)
+- **Parameters**: 8 billion
+- **Training Method**: Boundary-Guided Policy Optimization (BGPO)
+- **Base Model**: LLaDA-8B-Instruct
+- **Task**: Code generation
+- **Language**: English
+## Training Details
+- **Training Epochs**: 5 epochs (112 steps per epoch)
+- **Total Steps**: 560 steps
+- **Response Length**: 512 tokens
+- **Train Diffusion Steps**: 512
+- **Eval Diffusion Steps**: 512
+- **Block Size**: 32
+- **Monte Carlo Sample Size ($n_t$)**: 16
+- **Learning Rate**: 5e-7
+- **Batch Size**: 16
+- **Framework**: Built on VeRL (Volcengine Reinforcement Learning)
+## Usage & Limitations
+- Primarily designed for code generation tasks.
+- Performance may vary on other tasks.
+- Requires appropriate computational resources for inference.