File size: 2,401 Bytes
dc721d2 9a7a97c dc721d2 9a7a97c dc721d2 9a7a97c dc721d2 9a7a97c dc721d2 b881508 9a7a97c b881508 dc721d2 9a7a97c dc721d2 9a7a97c dc721d2 dd72361 dc721d2 dd72361 dc721d2 9a7a97c dc721d2 dd72361 dc721d2 9a7a97c dd72361 9a7a97c dd72361 9a7a97c dd72361 9a7a97c dc721d2 c367f9b | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 | ---
tags:
- fraqtl
- kv-cache-optimized
- inference
license: other
---
# Mistral 7B — fraQtl KV Cache Optimized
**KV cache optimized with [fraQtl](https://fraqtl.ai)** — 3.5x less KV cache memory during inference.
> **Note:** The model file size is the same as the original (~14GB). The optimization modifies V projection weights so that at inference time, the KV cache uses 3.5x less GPU memory. The savings happen at runtime, not at download.
| Metric | Value |
|--------|-------|
| Original | [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) |
| File size | Same as original (~14GB) |
| KV cache memory | **3.5x less at runtime** |
| PPL before | 10.4690 |
| PPL after | 10.6908 |
| Delta | +0.222 (weight-level) |
| Config | k=64, INT3 |
## How It Works
The model weights are rotated into an eigenbasis that separates important V-cache directions from noise. At inference, the KV cache concentrates information in fewer dimensions — using 3.5x less memory.
**Our runtime compression (the real product) achieves +0.01 PPL** on the same model. Contact us for integration.
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("fraQtl/Mistral-7B-compressed")
tokenizer = AutoTokenizer.from_pretrained("fraQtl/Mistral-7B-compressed")
# KV cache uses 3.5x less memory during inference.
```
## Generation Samples
**Prompt:** Explain how photosynthesis works in simple terms:
**Output:** Photosynthesis is the process by which plants use energy from sunlight to make their own food. Plants need carbon dioxide, water, and light to make their own food...
**Prompt:** The three most important breakthroughs in physics during the 20th century were
**Output:** The three most important breakthroughs in physics during the 20th century were the theory of relativity, quantum mechanics, and string theory...
## Runtime Compression (the full product)
| Method | PPL Delta | How |
|--------|-----------|-----|
| This download (weight-level) | +0.222 | Modified weights, download and use |
| Runtime cache compression | **+0.01** | fraQtl applied during inference |
Runtime compression gives 30x better quality. Available for production deployment.
---
[fraqtl.ai](https://fraqtl.ai) | contact@fraqtl.ai | Patent pending. [Paper: arXiv:2604.11501](https://arxiv.org/abs/2604.11501)
|