--- license: apache-2.0 base_model: Qwen/Qwen2.5-0.5B-Instruct tags: - grpo - rl - support-ticket - lora - peft - trl --- # Support Ticket GRPO Agent Fine-tuned `Qwen/Qwen2.5-0.5B-Instruct` using GRPO (Group Relative Policy Optimization) + LoRA on a multi-step support ticket environment. ## Training Setup - **Algorithm:** GRPO via `trl.GRPOTrainer` + LoRA (PEFT) - **Base model:** Qwen/Qwen2.5-0.5B-Instruct - **Dataset:** 1000 prompts over 50 support tickets - **Environment:** [algocore-support-ticket-env](https://algocore-support-ticket-env.hf.space) - **Group size G:** 2 - **KL beta:** 0.04 - **Final loss:** 0.0008 ## Results | Task | Before | After | Delta | |---|---|---|---| | Task 1 (Classify) | 0.667 | 1.000 | +0.333 | | Task 2 (Action) | 0.117 | 0.450 | +0.333 | | Task 3 (Full Resolve) | 0.083 | 0.258 | +0.175 | | **Overall** | **0.289** | **0.569** | **+0.280** | ![GRPO Training Results](grpo_results.png)