AlgoCore's picture
Upload Qwen2ForCausalLM
faec868 verified
---
license: apache-2.0
base_model: Qwen/Qwen2.5-0.5B-Instruct
tags:
- grpo
- rl
- support-ticket
- lora
- peft
- trl
---
# Support Ticket GRPO Agent
Fine-tuned `Qwen/Qwen2.5-0.5B-Instruct` using GRPO (Group Relative Policy Optimization) + LoRA on a multi-step support ticket environment.
## Training Setup
- **Algorithm:** GRPO via `trl.GRPOTrainer` + LoRA (PEFT)
- **Base model:** Qwen/Qwen2.5-0.5B-Instruct
- **Dataset:** 1000 prompts over 50 support tickets
- **Environment:** [algocore-support-ticket-env](https://algocore-support-ticket-env.hf.space)
- **Group size G:** 2
- **KL beta:** 0.04
- **Final loss:** 0.0008
## Results
| Task | Before | After | Delta |
|---|---|---|---|
| Task 1 (Classify) | 0.667 | 1.000 | +0.333 |
| Task 2 (Action) | 0.117 | 0.450 | +0.333 |
| Task 3 (Full Resolve) | 0.083 | 0.258 | +0.175 |
| **Overall** | **0.289** | **0.569** | **+0.280** |
![GRPO Training Results](grpo_results.png)