mistral-7b-instruct-v0.3-bnb-4bit

This repository contains a quantized 4-bit version of Mistral-7B-Instruct-v0.3.

Model Details

  • Developed by: Original weights by Mistral AI; Quantized via Unsloth, by 4amthoughts Lab
  • Model Type: Large Language Model (LLM) - Causal Decoder-only.
  • Language(s): English
  • License: Apache-2.0
  • Quantization: 4-bit NormalFloat (NF4) using BitsAndBytes.

Usage

You can use this model with the transformers library. Ensure you have bitsandbytes and accelerate installed.

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

model_id = "3amthoughts/mistral-7b-instruct-v0.3-bnb-4bit"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=BitsAndBytesConfig(load_in_4bit=True),
    device_map="auto"
)

text = "Explain quantum physics to a five year old."
inputs = tokenizer(text, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

About 4-bit Quantization

This model was quantized to 4-bit to reduce VRAM usage while maintaining as much performance as possible. It allows the 7B parameter model to run on consumer GPUs with ~6GB-8GB of VRAM.

Downloads last month
60
Safetensors
Model size
7B params
Tensor type
F32
BF16
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for 3amthoughts/mistral-7b-instruct-v0.3-bnb-4bit