linny2002 commited on
Commit
20b81f6
·
verified ·
1 Parent(s): 08c2d6b

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -0
README.md ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - reinforcement-learning
7
+ - code-generation
8
+ - dllm
9
+ - bgpo
10
+ - llada
11
+ size_categories:
12
+ - 8B
13
+ base_model:
14
+ - GSAI-ML/LLaDA-8B-Instruct
15
+ ---
16
+
17
+ # LLaDA-8B-BGPO-code
18
+
19
+ [![Paper](https://img.shields.io/badge/Paper-arXiv:-red)]()
20
+ [![Code](https://img.shields.io/badge/Code-GitHub-blue)](https://github.com/THU-KEG/BGPO)
21
+
22
+ ## Model Description
23
+
24
+ **LLaDA-8B-BGPO-code** is an 8-billion parameter diffusion large language model (dLLM) that was trained on LLaDA-8B-Instruct using Boundary-Guided Policy Optimization (BGPO) for enhanced code generation capabilities.
25
+
26
+ ## Model Details
27
+
28
+ - **Model Type**: Diffusion Large Language Model (dLLM)
29
+ - **Parameters**: 8 billion
30
+ - **Training Method**: Boundary-Guided Policy Optimization (BGPO)
31
+ - **Base Model**: LLaDA-8B-Instruct
32
+ - **Task**: Code generation
33
+ - **Language**: English
34
+
35
+ ## Training Details
36
+
37
+ - **Training Epochs**: 5 epochs (112 steps per epoch)
38
+ - **Total Steps**: 560 steps
39
+ - **Response Length**: 512 tokens
40
+ - **Train Diffusion Steps**: 512
41
+ - **Eval Diffusion Steps**: 512
42
+ - **Block Size**: 32
43
+ - **Monte Carlo Sample Size ($n_t$)**: 16
44
+ - **Learning Rate**: 5e-7
45
+ - **Batch Size**: 16
46
+ - **Framework**: Built on VeRL (Volcengine Reinforcement Learning)
47
+
48
+ ## Usage & Limitations
49
+
50
+ - Primarily designed for code generation tasks.
51
+ - Performance may vary on other tasks.
52
+ - Requires appropriate computational resources for inference.