SpinBit 4-Bit INT8: Mistral 7B

This model is compressed using SpinBit Ultra technology.

  • Base Model: mistralai/Mistral-7B-v0.1
  • Method: Triple-Tap Compression (4-bit, 16-color palette)
  • Quantization: Weights encoded in 4-bit Base-27 + Per-Block INT8 Alphas
  • Size: 3.33 GB
  • Quality: Perplexity 14.53 (WikiText-2), Factual Accuracy 62.5%

Usage

# Download the loader script
from loader_tripletap_4bit_int8 import TripleTap4BitInt8Loader
from transformers import AutoModelForCausalLM
import torch

# 1. Load base structure (empty)
model = AutoModelForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-v0.1",
    torch_dtype=torch.float16,
    device_map="cuda"
)

# 2. Load compressed weights
loader = TripleTap4BitInt8Loader("mistral_7b_4bit_int8.safetensors")
loader.load_into_model(model)

# 3. Run inference
# ... standard transformers generation ...
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support