SpinBit 4-Bit INT8: Mistral 7B
This model is compressed using SpinBit Ultra technology.
- Base Model:
mistralai/Mistral-7B-v0.1 - Method: Triple-Tap Compression (4-bit, 16-color palette)
- Quantization: Weights encoded in 4-bit Base-27 + Per-Block INT8 Alphas
- Size: 3.33 GB
- Quality: Perplexity 14.53 (WikiText-2), Factual Accuracy 62.5%
Usage
# Download the loader script
from loader_tripletap_4bit_int8 import TripleTap4BitInt8Loader
from transformers import AutoModelForCausalLM
import torch
# 1. Load base structure (empty)
model = AutoModelForCausalLM.from_pretrained(
"mistralai/Mistral-7B-v0.1",
torch_dtype=torch.float16,
device_map="cuda"
)
# 2. Load compressed weights
loader = TripleTap4BitInt8Loader("mistral_7b_4bit_int8.safetensors")
loader.load_into_model(model)
# 3. Run inference
# ... standard transformers generation ...