Llama Spark โ€” FP8

FP8-quantized version of Arcee Llama Spark, an improved Llama 3.1 8B model from Arcee AI. FP8 quantization reduces memory footprint while preserving model quality.

Video walkthroughs:

Model Details

Detail Value
Base model arcee-ai/Llama-Spark
Architecture Llama 3.1 (8B parameters)
Quantization FP8 (8-bit floating point)
Model size ~11 GB (3 shards)
Format Safetensors

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "juliensimon/Llama-Spark-fp8", torch_dtype="auto", device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("juliensimon/Llama-Spark-fp8")
Downloads last month
13
Safetensors
Model size
8B params
Tensor type
F32
ยท
F8_E4M3
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for juliensimon/Llama-Spark-fp8

Quantized
(7)
this model