Llama Spark — FP8

FP8-quantized version of Arcee Llama Spark, an improved Llama 3.1 8B model from Arcee AI. FP8 quantization reduces memory footprint while preserving model quality.

Video walkthroughs:

SLM in Action: Arcee Spark, Llama-3.1 8B, improved!

Arcee Llama Spark, a better Llama 3.1

Model Details

Detail	Value
Base model	arcee-ai/Llama-Spark
Architecture	Llama 3.1 (8B parameters)
Quantization	FP8 (8-bit floating point)
Model size	~11 GB (3 shards)
Format	Safetensors

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "juliensimon/Llama-Spark-fp8", torch_dtype="auto", device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("juliensimon/Llama-Spark-fp8")

Downloads last month: 1

Safetensors

Model size

8B params

Tensor type

F32

F8_E4M3

Model tree for juliensimon/Llama-Spark-fp8

Base model

arcee-ai/Llama-Spark

Quantized

(7)

this model