Llama 3.1 8B Instruct (4-bit Quantized)
This is a 4-bit quantized version of Meta-Llama-3.1-8B-Instruct, quantized using bitsandbytes / AutoGPTQ and optimized for vLLM inference.
Usage (Transformers)
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("your-username/Llama-3.1-8B-Instruct-4bit")
model = AutoModelForCausalLM.from_pretrained(
"your-username/Llama-3.1-8B-Instruct-4bit",
device_map="auto",
)
- Downloads last month
- -