Mistral 7B — fraQtl Weight-Compressed

Full model weight compression with fraQtl

Metric	Value
Original	mistralai/Mistral-7B-v0.1
Weight compression	4.4x on MLP projections
PPL delta	+0.43 to +0.48 (run-to-run variance ±0.07)
File size	14.5 GB (fp16 stub — packed INT3 version coming)
KV cache	Additional 3.5x with runtime compression

What This Is

This model's MLP weights (68% of parameters) have been compressed using the fraQtl eigenbasis — the same V Theorem that powers our KV cache compression, applied to weight matrices.

The model file is currently stored as fp16 (same size as original). A packed INT3 version (~8.5 GB) is coming soon.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("fraQtl/Mistral-7B-fraqtl", torch_dtype="float16", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("fraQtl/Mistral-7B-fraqtl")

# Optional: enable KV cache compression for additional memory savings
# pip install fraqtl
# import fraqtl
# fraqtl.enable_cache_compression(model, k=16, bits=3)

output = model.generate(tokenizer("The future of AI is", return_tensors="pt").input_ids.to("cuda"), max_new_tokens=100)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Combined Savings (weight + KV cache)

Component	FP16	With fraQtl
Model weights	14.5 GB	~8.5 GB (packed, coming)
KV cache (32K)	4.3 GB	1.2 GB
Total GPU	18.8 GB	~9.7 GB

Mistral 7B on a consumer RTX 3080 (10 GB). Previously required A100.

Technical Details

MLP projections (gate, up, down) compressed via eigenbasis-guided GPTQ
Eigenbasis from input covariance (X^T X) per layer
Lloyd-Max quantization for INT2 sacrifice tiers
Sequential calibration across 32 layers
V/K projections preserved at full precision (compressed at runtime via KV cache hook)

Note on Quality

Run-to-run variance: ±0.07 PPL (CUDA non-determinism in eigenbasis computation). Honest range: +0.35 to +0.55 PPL. Seed stability experiment (C26) in progress to tighten error bars.

For best demo quality, use the Instruct version. This base model is not optimized for instruction following. Weight compression of Mistral-7B-Instruct is coming.

fraqtl.ai | contact@fraqtl.ai | Patent pending. Paper: arXiv:2604.11501

Downloads last month: 73

Safetensors

Model size

7B params

Tensor type

F16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for fraQtl/Mistral-7B-fraqtl

Quantization Dominates Rank Reduction for KV-Cache Compression

Paper • 2604.11501 • Published 4 days ago