--- language: - en license: llama3.2 tags: - mobile - edge-ai - quantized - gguf - q8 pipeline_tag: text-generation --- # Llama 3.2 1B Instruct - Q8 Mobile (GGUF) Higher-fidelity Q8 quantization of Meta's Llama 3.2 1B Instruct. When you need maximum quality retention from a 1B model. | Property | Value | |----------|-------| | **Parameters** | 1.23 billion | | **Quantization** | Q8_0 (8-bit) | | **Size** | ~1.3 GB | | **Quality Retention** | ~98% of original | | **Speed** | ~22 tok/s (S20 FE CPU) | ## When to Use This Over Q4 Choose Q8 when accuracy matters more than size: production chatbots, content moderation, applications where errors are costly.