Qwen3Guard-Gen-0.6B / README.md
geoffmunn's picture
Add Q2–Q8_0 quantized models with per-model cards, MODELFILE, CLI examples, and auto-upload
f79843f verified
metadata
license: apache-2.0
tags:
  - gguf
  - qwen
  - safety
  - guardrail
  - text-generation
  - tiny-llm
  - llama.cpp
base_model: Qwen/Qwen3Guard-Gen-0.6B
author: geoffmunn
pipeline_tag: text-generation

Qwen3Guard-Gen-0.6B-GGUF

This is a GGUF-quantized version of Qwen3Guard-Gen-0.6B, a tiny yet safety-aligned generative model from Alibaba's Qwen team.

At just ~0.6B parameters, this model is optimized for:

  • Ultra-fast inference
  • Low-memory environments (phones, Raspberry Pi, embedded)
  • Real-time filtering and response generation
  • Privacy-first apps where small size matters

⚠️ This is a generative model with built-in safety constraints, designed to refuse harmful requests while running efficiently on-device.

πŸ›‘ What Is Qwen3Guard-Gen-0.6B?

It’s a compact helpful assistant trained to:

  • Respond helpfully to simple queries
  • Politely decline unsafe ones (e.g., illegal acts, self-harm)
  • Avoid generating toxic content
  • Run completely offline with minimal resources

Perfect for:

  • Mobile AI assistants
  • IoT devices
  • Edge computing
  • Fast pre-filter + response pipelines
  • Educational tools on low-end hardware

πŸ”— Relationship to Other Safety Models

Part of the full Qwen3 safety stack:

Model Size Role
Qwen3Guard-Gen-0.6B 🟒 Tiny Lightweight safe generator
Qwen3Guard-Stream-4B/8B 🟑 Medium/Large Streaming input filter
Qwen3Guard-Gen-4B/8B 🟑 Large High-quality safe generation
Qwen3-4B-SafeRL 🟑 Large Fully aligned ethical agent

Recommended Architecture

User Input
    ↓
[Optional: Qwen3Guard-Stream-4B] ← optional pre-filter
    ↓
[Qwen3Guard-Gen-0.6B]
    ↓
Fast, Safe Response

Use this when you need speed and privacy over deep reasoning.

Available Quantizations

Level Size RAM Usage Use Case
Q2_K ~0.45 GB ~0.6 GB Only on very weak devices
Q3_K_S ~0.52 GB ~0.7 GB Minimal viability
Q3_K_M ~0.59 GB ~0.8 GB Basic chat on microcontrollers
Q4_K_S ~0.68 GB ~0.9 GB Good for edge devices
Q4_K_M ~0.75 GB ~1.0 GB βœ… Best balance for most users
Q5_K_S ~0.73 GB ~0.95 GB Slightly faster than Q5_K_M
Q5_K_M ~0.75 GB ~1.0 GB βœ…βœ… Top quality for tiny model
Q6_K ~0.85 GB ~1.1 GB Near-original fidelity
Q8_0 ~1.10 GB ~1.3 GB Maximum accuracy (research)

πŸ’‘ Recommendation: Use Q4_K_M or Q5_K_M for best trade-off between speed and safety reliability.

Tools That Support It

  • LM Studio – load and test locally
  • OpenWebUI – deploy with RAG and tools
  • GPT4All – private, offline AI chatbot
  • Directly via llama.cpp, Ollama, or TGI

Author

πŸ‘€ Geoff Munn (@geoffmunn)
πŸ”— Hugging Face Profile

Disclaimer

Community conversion for local inference. Not affiliated with Alibaba Cloud.