AlgoCore
/

support-ticket-grpo-model

Model card Files Files and versions

support-ticket-grpo-model / README.md

AlgoCore's picture

Upload Qwen2ForCausalLM

faec868 verified about 1 month ago

|

history blame contribute delete

932 Bytes

	---
	license: apache-2.0
	base_model: Qwen/Qwen2.5-0.5B-Instruct
	tags:
	- grpo
	- rl
	- support-ticket
	- lora
	- peft
	- trl
	---

	# Support Ticket GRPO Agent

	Fine-tuned `Qwen/Qwen2.5-0.5B-Instruct` using GRPO (Group Relative Policy Optimization) + LoRA on a multi-step support ticket environment.

	## Training Setup
	- Algorithm: GRPO via `trl.GRPOTrainer` + LoRA (PEFT)
	- Base model: Qwen/Qwen2.5-0.5B-Instruct
	- Dataset: 1000 prompts over 50 support tickets
	- Environment: [algocore-support-ticket-env](https://algocore-support-ticket-env.hf.space)
	- Group size G: 2
	- KL beta: 0.04
	- Final loss: 0.0008

	## Results

	\| Task \| Before \| After \| Delta \|
	\|---\|---\|---\|---\|
	\| Task 1 (Classify) \| 0.667 \| 1.000 \| +0.333 \|
	\| Task 2 (Action) \| 0.117 \| 0.450 \| +0.333 \|
	\| Task 3 (Full Resolve) \| 0.083 \| 0.258 \| +0.175 \|
	\| Overall \| 0.289 \| 0.569 \| +0.280 \|

	![GRPO Training Results](grpo_results.png)