AlgoCore's picture
Upload Qwen2ForCausalLM
faec868 verified
metadata
license: apache-2.0
base_model: Qwen/Qwen2.5-0.5B-Instruct
tags:
  - grpo
  - rl
  - support-ticket
  - lora
  - peft
  - trl

Support Ticket GRPO Agent

Fine-tuned Qwen/Qwen2.5-0.5B-Instruct using GRPO (Group Relative Policy Optimization) + LoRA on a multi-step support ticket environment.

Training Setup

  • Algorithm: GRPO via trl.GRPOTrainer + LoRA (PEFT)
  • Base model: Qwen/Qwen2.5-0.5B-Instruct
  • Dataset: 1000 prompts over 50 support tickets
  • Environment: algocore-support-ticket-env
  • Group size G: 2
  • KL beta: 0.04
  • Final loss: 0.0008

Results

Task Before After Delta
Task 1 (Classify) 0.667 1.000 +0.333
Task 2 (Action) 0.117 0.450 +0.333
Task 3 (Full Resolve) 0.083 0.258 +0.175
Overall 0.289 0.569 +0.280

GRPO Training Results