Add Q2–Q8_0 quantized models with per-model cards, MODELFILE, CLI examples, and auto-upload

f79843f verified 6 months ago

3.11 kB

license: apache-2.0
tags:
  - gguf
  - qwen
  - safety
  - guardrail
  - text-generation
  - tiny-llm
  - llama.cpp
base_model: Qwen/Qwen3Guard-Gen-0.6B
author: geoffmunn
pipeline_tag: text-generation

Qwen3Guard-Gen-0.6B-GGUF

This is a GGUF-quantized version of Qwen3Guard-Gen-0.6B, a tiny yet safety-aligned generative model from Alibaba's Qwen team.

At just ~0.6B parameters, this model is optimized for:

Ultra-fast inference
Low-memory environments (phones, Raspberry Pi, embedded)
Real-time filtering and response generation
Privacy-first apps where small size matters

⚠️ This is a generative model with built-in safety constraints, designed to refuse harmful requests while running efficiently on-device.

🛡 What Is Qwen3Guard-Gen-0.6B?

It’s a compact helpful assistant trained to:

Respond helpfully to simple queries
Politely decline unsafe ones (e.g., illegal acts, self-harm)
Avoid generating toxic content
Run completely offline with minimal resources

Perfect for:

Mobile AI assistants
IoT devices
Edge computing
Fast pre-filter + response pipelines
Educational tools on low-end hardware

🔗 Relationship to Other Safety Models

Part of the full Qwen3 safety stack:

Model	Size	Role
Qwen3Guard-Gen-0.6B	🟢 Tiny	Lightweight safe generator
Qwen3Guard-Stream-4B/8B	🟡 Medium/Large	Streaming input filter
Qwen3Guard-Gen-4B/8B	🟡 Large	High-quality safe generation
Qwen3-4B-SafeRL	🟡 Large	Fully aligned ethical agent

Recommended Architecture

User Input
    ↓
[Optional: Qwen3Guard-Stream-4B] ← optional pre-filter
    ↓
[Qwen3Guard-Gen-0.6B]
    ↓
Fast, Safe Response

Use this when you need speed and privacy over deep reasoning.

Available Quantizations

Level	Size	RAM Usage	Use Case
Q2_K	~0.45 GB	~0.6 GB	Only on very weak devices
Q3_K_S	~0.52 GB	~0.7 GB	Minimal viability
Q3_K_M	~0.59 GB	~0.8 GB	Basic chat on microcontrollers
Q4_K_S	~0.68 GB	~0.9 GB	Good for edge devices
Q4_K_M	~0.75 GB	~1.0 GB	✅ Best balance for most users
Q5_K_S	~0.73 GB	~0.95 GB	Slightly faster than Q5_K_M
Q5_K_M	~0.75 GB	~1.0 GB	✅✅ Top quality for tiny model
Q6_K	~0.85 GB	~1.1 GB	Near-original fidelity
Q8_0	~1.10 GB	~1.3 GB	Maximum accuracy (research)

💡 Recommendation: Use Q4_K_M or Q5_K_M for best trade-off between speed and safety reliability.

Tools That Support It

LM Studio – load and test locally
OpenWebUI – deploy with RAG and tools
GPT4All – private, offline AI chatbot
Directly via llama.cpp, Ollama, or TGI

Author

👤 Geoff Munn (@geoffmunn)
🔗 Hugging Face Profile

Disclaimer

Community conversion for local inference. Not affiliated with Alibaba Cloud.