3morixd's picture
Professional model card upgrade: benchmarks, code examples, usage guide
d6a96fd verified
|
Raw
History Blame Contribute Delete
664 Bytes
metadata
language:
  - en
license: llama3.2
tags:
  - mobile
  - edge-ai
  - quantized
  - gguf
  - q8
pipeline_tag: text-generation

Llama 3.2 1B Instruct - Q8 Mobile (GGUF)

Higher-fidelity Q8 quantization of Meta's Llama 3.2 1B Instruct. When you need maximum quality retention from a 1B model.

Property Value
Parameters 1.23 billion
Quantization Q8_0 (8-bit)
Size ~1.3 GB
Quality Retention ~98% of original
Speed ~22 tok/s (S20 FE CPU)

When to Use This Over Q4

Choose Q8 when accuracy matters more than size: production chatbots, content moderation, applications where errors are costly.