Instructions to use AlgoCore/support-ticket-grpo-model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use AlgoCore/support-ticket-grpo-model with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct") model = PeftModel.from_pretrained(base_model, "AlgoCore/support-ticket-grpo-model") - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| base_model: Qwen/Qwen2.5-0.5B-Instruct | |
| tags: | |
| - grpo | |
| - rl | |
| - support-ticket | |
| - lora | |
| - peft | |
| - trl | |
| # Support Ticket GRPO Agent | |
| Fine-tuned `Qwen/Qwen2.5-0.5B-Instruct` using GRPO (Group Relative Policy Optimization) + LoRA on a multi-step support ticket environment. | |
| ## Training Setup | |
| - **Algorithm:** GRPO via `trl.GRPOTrainer` + LoRA (PEFT) | |
| - **Base model:** Qwen/Qwen2.5-0.5B-Instruct | |
| - **Dataset:** 1000 prompts over 50 support tickets | |
| - **Environment:** [algocore-support-ticket-env](https://algocore-support-ticket-env.hf.space) | |
| - **Group size G:** 2 | |
| - **KL beta:** 0.04 | |
| - **Final loss:** 0.0008 | |
| ## Results | |
| | Task | Before | After | Delta | | |
| |---|---|---|---| | |
| | Task 1 (Classify) | 0.667 | 1.000 | +0.333 | | |
| | Task 2 (Action) | 0.117 | 0.450 | +0.333 | | |
| | Task 3 (Full Resolve) | 0.083 | 0.258 | +0.175 | | |
| | **Overall** | **0.289** | **0.569** | **+0.280** | | |
|  | |