Llama Spark โ FP8
FP8-quantized version of Arcee Llama Spark, an improved Llama 3.1 8B model from Arcee AI. FP8 quantization reduces memory footprint while preserving model quality.
Video walkthroughs:
Model Details
| Detail | Value |
|---|---|
| Base model | arcee-ai/Llama-Spark |
| Architecture | Llama 3.1 (8B parameters) |
| Quantization | FP8 (8-bit floating point) |
| Model size | ~11 GB (3 shards) |
| Format | Safetensors |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"juliensimon/Llama-Spark-fp8", torch_dtype="auto", device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("juliensimon/Llama-Spark-fp8")
- Downloads last month
- 13
Model tree for juliensimon/Llama-Spark-fp8
Base model
arcee-ai/Llama-Spark