Mistral 7B — fraQtl Weight-Compressed

Full model weight compression with fraQtl

Metric Value
Original mistralai/Mistral-7B-v0.1
Weight compression 4.4x on MLP projections
PPL delta +0.43 to +0.48 (run-to-run variance ±0.07)
File size 14.5 GB (fp16 stub — packed INT3 version coming)
KV cache Additional 3.5x with runtime compression

What This Is

This model's MLP weights (68% of parameters) have been compressed using the fraQtl eigenbasis — the same V Theorem that powers our KV cache compression, applied to weight matrices.

The model file is currently stored as fp16 (same size as original). A packed INT3 version (~8.5 GB) is coming soon.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("fraQtl/Mistral-7B-fraqtl", torch_dtype="float16", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("fraQtl/Mistral-7B-fraqtl")

# Optional: enable KV cache compression for additional memory savings
# pip install fraqtl
# import fraqtl
# fraqtl.enable_cache_compression(model, k=16, bits=3)

output = model.generate(tokenizer("The future of AI is", return_tensors="pt").input_ids.to("cuda"), max_new_tokens=100)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Combined Savings (weight + KV cache)

Component FP16 With fraQtl
Model weights 14.5 GB ~8.5 GB (packed, coming)
KV cache (32K) 4.3 GB 1.2 GB
Total GPU 18.8 GB ~9.7 GB

Mistral 7B on a consumer RTX 3080 (10 GB). Previously required A100.

Technical Details

  • MLP projections (gate, up, down) compressed via eigenbasis-guided GPTQ
  • Eigenbasis from input covariance (X^T X) per layer
  • Lloyd-Max quantization for INT2 sacrifice tiers
  • Sequential calibration across 32 layers
  • V/K projections preserved at full precision (compressed at runtime via KV cache hook)

Note on Quality

Run-to-run variance: ±0.07 PPL (CUDA non-determinism in eigenbasis computation). Honest range: +0.35 to +0.55 PPL. Seed stability experiment (C26) in progress to tighten error bars.

For best demo quality, use the Instruct version. This base model is not optimized for instruction following. Weight compression of Mistral-7B-Instruct is coming.


fraqtl.ai | contact@fraqtl.ai | Patent pending. Paper: arXiv:2604.11501

Downloads last month
73
Safetensors
Model size
7B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for fraQtl/Mistral-7B-fraqtl