SpinBit 4-Bit INT8: Mistral 7B

This model is compressed using SpinBit Ultra technology.

Base Model: mistralai/Mistral-7B-v0.1
Method: Triple-Tap Compression (4-bit, 16-color palette)
Quantization: Weights encoded in 4-bit Base-27 + Per-Block INT8 Alphas
Size: 3.33 GB
Quality: Perplexity 14.53 (WikiText-2), Factual Accuracy 62.5%

Usage

# Download the loader script
from loader_tripletap_4bit_int8 import TripleTap4BitInt8Loader
from transformers import AutoModelForCausalLM
import torch

# 1. Load base structure (empty)
model = AutoModelForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-v0.1",
    torch_dtype=torch.float16,
    device_map="cuda"
)

# 2. Load compressed weights
loader = TripleTap4BitInt8Loader("mistral_7b_4bit_int8.safetensors")
loader.load_into_model(model)

# 3. Run inference
# ... standard transformers generation ...

Downloads last month: -; Downloads are not tracked for this model. How to track