PEFT
Safetensors
grpo_saved_lora_7 / README.md
xuyingliKepler's picture
Update README.md
cd4766c verified
---
base_model:
- Qwen/Qwen2.5-7B-Instruct
library_name: peft
license: apache-2.0
---
# GRPO-LoRA-Base
This is a LoRA adapter trained using the **GRPO (Group Relative Policy Optimization)** algorithm with a **multi-label reward model**, fine-tuned on Qwen2.5-0.5B for safe and aligned language generation.
## πŸ” Overview
- **Base Model**: Qwen/Qwen2.5-0.5B-Instruct
- **Tuning Method**: GRPO (No value critic, group-based relative rewards)
- **LoRA Adapter**: Applied to attention and MLP projection layers
- **Epochs**: 3
- **Steps**: 1000
- **GPU Memory Usage**: ~50% (4-bit + LoRA)
## πŸ“Š Reward Model
A RoBERTa-based multi-label regression model was used to compute rewards on four alignment axes:
- **Politeness**
- **Meaningfulness**
- **Actionability**
- **Safety**
Each output was scored in [0,1], and the **sum** of the four scores was used as the scalar reward.
## πŸ§ͺ Training Data
- **Dataset**: 7,000 adversarial prompts crafted to challenge LLM alignment
- **Format**: Prompt-response pairs with human-annotated alignment scores
- **Split**: 6K training / 1K validation
## 🏁 Evaluation
| Metric | Base | Fine-Tuned | Ξ” |
|---------------|------|------------|-------|
| Politeness | 0.48 | 0.59 | +0.11 |
| Meaningfulness | 0.61 | 0.65 | +0.04 |
| Actionability | 0.53 | 0.66 | +0.13 |
| Safety | 0.42 | 0.70 | +0.28 |
| **Combined** | 0.54 | 0.66 | +0.12 |
## πŸš€ How to Use
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
adapter = PeftModel.from_pretrained(base_model, "hydroxai/grpo_saved_lora_7")
inputs = tokenizer("How can we improve online safety?", return_tensors="pt")
outputs = adapter.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## ✍️ Citation
If you use this model, please cite:
```bibtex
@article{li2025safegrpo,
title = {Optimizing Safe and Aligned Language Generation: A Multi-Objective GRPO Approach},
author = {Li, Xuying and Li, Zhuo and Kosuga, Yuji and Bian, Victor},
journal = {arXiv preprint arXiv:2503.21819},
year = {2025},
url = {https://arxiv.org/abs/2503.21819}
}
```
Maintained by HydroX AI.