3morixd's picture
Professional model card upgrade: benchmarks, code examples, usage guide
d6a96fd verified
|
Raw
History Blame Contribute Delete
664 Bytes
---
language:
- en
license: llama3.2
tags:
- mobile
- edge-ai
- quantized
- gguf
- q8
pipeline_tag: text-generation
---
# Llama 3.2 1B Instruct - Q8 Mobile (GGUF)
Higher-fidelity Q8 quantization of Meta's Llama 3.2 1B Instruct. When you need maximum quality retention from a 1B model.
| Property | Value |
|----------|-------|
| **Parameters** | 1.23 billion |
| **Quantization** | Q8_0 (8-bit) |
| **Size** | ~1.3 GB |
| **Quality Retention** | ~98% of original |
| **Speed** | ~22 tok/s (S20 FE CPU) |
## When to Use This Over Q4
Choose Q8 when accuracy matters more than size: production chatbots, content moderation, applications where errors are costly.