Llama 3.1 8B Instruct (4-bit Quantized)

This is a 4-bit quantized version of Meta-Llama-3.1-8B-Instruct, quantized using bitsandbytes / AutoGPTQ and optimized for vLLM inference.

Usage (Transformers)

from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("your-username/Llama-3.1-8B-Instruct-4bit")
model = AutoModelForCausalLM.from_pretrained(
    "your-username/Llama-3.1-8B-Instruct-4bit",
    device_map="auto",
)
Downloads last month
-
Safetensors
Model size
8B params
Tensor type
F16
F32
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support