AlgoCore
/

support-ticket-grpo-model

Model card Files Files and versions

Support Ticket GRPO Agent

Fine-tuned Qwen/Qwen2.5-0.5B-Instruct using GRPO (Group Relative Policy Optimization) + LoRA on a multi-step support ticket environment.

Training Setup

Algorithm: GRPO via trl.GRPOTrainer + LoRA (PEFT)
Base model: Qwen/Qwen2.5-0.5B-Instruct
Dataset: 1000 prompts over 50 support tickets
Environment: algocore-support-ticket-env
Group size G: 2
KL beta: 0.04
Final loss: 0.0008

Results

Task	Before	After	Delta
Task 1 (Classify)	0.667	1.000	+0.333
Task 2 (Action)	0.117	0.450	+0.333
Task 3 (Full Resolve)	0.083	0.258	+0.175
Overall	0.289	0.569	+0.280

Downloads last month: 20

Safetensors

Model size

0.5B params

Tensor type

F32

·

F16

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AlgoCore/support-ticket-grpo-model

Base model

Qwen/Qwen2.5-0.5B

Finetuned

Qwen/Qwen2.5-0.5B-Instruct

Adapter

(678)

this model