---
language:
- en
license: llama3.2
tags:
- mobile
- edge-ai
- quantized
- gguf
- q8
pipeline_tag: text-generation
---

# Llama 3.2 1B Instruct - Q8 Mobile (GGUF)

Higher-fidelity Q8 quantization of Meta's Llama 3.2 1B Instruct. When you need maximum quality retention from a 1B model.

| Property | Value |
|----------|-------|
| **Parameters** | 1.23 billion |
| **Quantization** | Q8_0 (8-bit) |
| **Size** | ~1.3 GB |
| **Quality Retention** | ~98% of original |
| **Speed** | ~22 tok/s (S20 FE CPU) |

## When to Use This Over Q4

Choose Q8 when accuracy matters more than size: production chatbots, content moderation, applications where errors are costly.