TurkishCodeMan
/

Qwen2.5-3B-Instruct-grpo-gmail-GGUF

4-bit precision

Model card Files Files and versions

Qwen2.5-3B-Instruct-grpo-gmail-GGUF / README.md

TurkishCodeMan's picture

Upload README.md with huggingface_hub

1f6740a verified 5 months ago

|

history blame contribute delete

2.4 kB

	---
	language:
	- en
	license: apache-2.0
	base_model: TurkishCodeMan/Qwen2.5-3B-Instruct-grpo-gmail
	quantized_by: TurkishCodeMan
	tags:
	- gguf
	- quantized
	- tool-calling
	- llama.cpp
	- 4-bit
	---

	# Qwen2.5-3B-Instruct GRPO Gmail (Q4_K_M GGUF)

	Quantized version of [TurkishCodeMan/Qwen2.5-3B-Instruct-grpo-gmail](https://huggingface.co/TurkishCodeMan/Qwen2.5-3B-Instruct-grpo-gmail).

	## 📥 Download & Run

	\`\`\`bash
	# Download (recommended - 3.5 GB)
	huggingface-cli download TurkishCodeMan/Qwen2.5-3B-Instruct-grpo-gmail-GGUF Qwen2.5-3B-Instruct-grpo-gmail-Q4_K_M.gguf

	# Run with GPU
	./llama-server -m Qwen2.5-3B-Instruct-grpo-gmail-Q4_K_M.gguf --port 8080 -ngl 99

	# Run on CPU
	./llama-server -m Qwen2.5-3B-Instruct-grpo-gmail-Q4_K_M.gguf --port 8080
	\`\`\`

	## ⚙️ Quantization Info

	- Method: Q4_K_M (4-bit with K-means)
	- Size: ~2.3 GB (vs 6.7 GB F16)
	- Quality: 95%+ of F16 performance
	- Speed: 3-4x faster inference

	## 🔗 Related Models

	- Full precision (F16): [TurkishCodeMan/Qwen2.5-3B-Instruct-grpo-gmail](https://huggingface.co/TurkishCodeMan/Qwen2.5-3B-Instruct-grpo-gmail)
	- Base model: [unsloth/Qwen2.5-3B-Instruct](https://huggingface.co/unsloth/Qwen2.5-3B-Instruct)

	## 🎯 Tool Calling Example

	\`\`\`python
	import requests

	response = requests.post("http://localhost:8080/v1/chat/completions", json={
	"messages": [
	{"role": "system", "content": "You are a tool-calling assistant."},
	{"role": "user", "content": "Send email to test@gmail.com about meeting tomorrow"}
	],
	"temperature": 0.0,
	"max_tokens": 512
	})

	print(response.json()['choices'][0]['message']['content'])
	# Output: {"tool_calls": [{"function": "send_email", "arguments": {"to": ["test@gmail.com"], "subject": "Meeting Tomorrow", "body": "..."}}]}
	\`\`\`

	## 📊 Training

	- SFT: 300 steps on 57 Gmail examples
	- GRPO: 300 steps reinforcement learning for tool calling accuracy
	- Final loss: 0.50 (excellent convergence)

	## 🛠️ Supported Tools

	\`send_email\`, \`draft_email\`, \`read_email\`, \`search_emails\`, \`delete_email\`, \`modify_email\`, \`batch_modify_emails\`, \`batch_delete_emails\`, \`list_email_labels\`, \`create_label\`, \`update_label\`, \`delete_label\`, \`get_or_create_label\`, \`create_filter\`, \`list_filters\`, \`get_filter\`, \`delete_filter\`, \`create_filter_from_template\`, \`download_attachment\`