THU-KEG
/

LLaDA-8B-BGPO-code

Reinforcement Learning

code-generation

Model card Files Files and versions

LLaDA-8B-BGPO-code / README.md

linny2002's picture

Update README.md

ee75175 verified 4 months ago

|

history blame contribute delete

1.4 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- reinforcement-learning
	- code-generation
	- dllm
	- bgpo
	- llada
	size_categories:
	- 8B
	---

	# LLaDA-8B-BGPO-code

	[![Paper](https://img.shields.io/badge/Paper-arXiv:2510.11683-red)](https://arxiv.org/abs/2510.11683)
	[![Code](https://img.shields.io/badge/Code-GitHub-blue)](https://github.com/THU-KEG/BGPO)

	## Model Description

	LLaDA-8B-BGPO-code is an 8-billion parameter diffusion large language model (dLLM) that was trained on LLaDA-8B-Instruct using Boundary-Guided Policy Optimization (BGPO) for enhanced code generation capabilities.

	## Model Details

	- Model Type: Diffusion Large Language Model (dLLM)
	- Parameters: 8 billion
	- Training Method: Boundary-Guided Policy Optimization (BGPO)
	- Base Model: LLaDA-8B-Instruct
	- Task: Code generation
	- Language: English

	## Training Details

	- Training Epochs: 5 epochs (112 steps per epoch)
	- Total Steps: 560 steps
	- Response Length: 512 tokens
	- Train Diffusion Steps: 512
	- Eval Diffusion Steps: 512
	- Block Size: 32
	- Monte Carlo Sample Size ($n_t$): 16
	- Learning Rate: 5e-7
	- Batch Size: 16
	- Framework: Built on VeRL (Volcengine Reinforcement Learning)

	## Usage & Limitations

	- Primarily designed for code generation tasks.
	- Performance may vary on other tasks.
	- Requires appropriate computational resources for inference.