File size: 932 Bytes
c0bf204
7195b82
62f4e98
c0bf204
b82e083
7195b82
 
c0bf204
7195b82
faec868
c0bf204
 
7195b82
c0bf204
7195b82
c0bf204
7195b82
 
 
 
 
 
 
 
c0bf204
7195b82
c0bf204
7195b82
 
 
 
 
 
c0bf204
7195b82
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
---
license: apache-2.0
base_model: Qwen/Qwen2.5-0.5B-Instruct
tags:
- grpo
- rl
- support-ticket
- lora
- peft
- trl
---

# Support Ticket GRPO Agent

Fine-tuned `Qwen/Qwen2.5-0.5B-Instruct` using GRPO (Group Relative Policy Optimization) + LoRA on a multi-step support ticket environment.

## Training Setup
- **Algorithm:** GRPO via `trl.GRPOTrainer` + LoRA (PEFT)
- **Base model:** Qwen/Qwen2.5-0.5B-Instruct
- **Dataset:** 1000 prompts over 50 support tickets
- **Environment:** [algocore-support-ticket-env](https://algocore-support-ticket-env.hf.space)
- **Group size G:** 2
- **KL beta:** 0.04
- **Final loss:** 0.0008

## Results

| Task | Before | After | Delta |
|---|---|---|---|
| Task 1 (Classify) | 0.667 | 1.000 | +0.333 |
| Task 2 (Action) | 0.117 | 0.450 | +0.333 |
| Task 3 (Full Resolve) | 0.083 | 0.258 | +0.175 |
| **Overall** | **0.289** | **0.569** | **+0.280** |

![GRPO Training Results](grpo_results.png)