File size: 4,180 Bytes
feab54a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 | ---
base_model: mistralai/Leanstral-2603
library_name: transformers
tags:
- rotorquant
- kv-cache-quantization
- leanstral
- lean4
- formal-proofs
- theorem-proving
- quantized
- mistral
- moe
license: apache-2.0
---
# Leanstral-RotorQuant
**KV-cache quantized [Leanstral-2603](https://huggingface.co/mistralai/Leanstral-2603) using [RotorQuant](https://github.com/scrya-com/rotorquant) for high-throughput Lean 4 formal proof generation.**
Leanstral is the first open-source AI agent purpose-built for Lean 4 formal proofs -- generating both executable code and machine-checkable mathematical proofs. This variant applies RotorQuant KV-cache quantization, delivering **5.3x faster prefill** and **28% faster decode** compared to TurboQuant while preserving full BF16 model weights.
## Overview
This repository provides the **RotorQuant KV-cache-only** configuration of Leanstral-2603. The model weights remain at full precision; only the KV cache is quantized during inference using RotorQuant's rotation-aware quantization scheme.
| Spec | Value |
|------|-------|
| Base model | [mistralai/Leanstral-2603](https://huggingface.co/mistralai/Leanstral-2603) |
| Architecture | Mistral MoE (~119B parameters, 7 consolidated shards) |
| Compression | RotorQuant KV-cache quantization |
| Weight precision | BF16 (unmodified) |
| KV-cache precision | Mixed-precision quantized |
| Prefill speedup | 5.3x vs TurboQuant |
| Decode speedup | 28% vs TurboQuant |
| License | Apache 2.0 |
| Use case | Lean 4 formal verification, theorem proving, mathematical proofs |
## Quickstart
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from turboquant import IsoQuantCache
model_id = "majentik/Leanstral-RotorQuant"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype="auto",
)
# Enable RotorQuant KV-cache quantization
cache = IsoQuantCache(model)
prompt = "Prove that for all natural numbers n, n + 0 = n in Lean 4:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=512,
past_key_values=cache,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## What is RotorQuant?
[RotorQuant](https://github.com/scrya-com/rotorquant) is an advanced KV-cache quantization method that leverages rotation-aware quantization to achieve superior throughput compared to standard KV-cache compression. By exploiting the rotary positional embedding structure, RotorQuant achieves:
- **5.3x faster prefill** -- critical for long Lean 4 proof contexts
- **28% faster decode** -- faster token-by-token proof generation
- Equivalent memory savings to TurboQuant with better computational efficiency
This makes RotorQuant the preferred choice for interactive theorem proving sessions where latency matters.
## Memory Estimates
| Component | Estimate |
|-----------|----------|
| Model weights (BF16) | ~238 GB |
| KV-cache savings | 2-4x reduction vs FP16 KV cache |
| Recommended VRAM | 4x A100 80GB or equivalent |
## Lean 4 Use Case
Leanstral excels at:
- **Formal verification** -- generating machine-checkable proofs of mathematical theorems
- **Theorem proving** -- interactive and automated proof search in Lean 4
- **Code generation** -- writing verified Lean 4 programs with correctness guarantees
- **Proof repair** -- fixing incomplete or broken proof scripts
## See Also
- [mistralai/Leanstral-2603](https://huggingface.co/mistralai/Leanstral-2603) -- Base model
- [majentik/Leanstral-TurboQuant](https://huggingface.co/majentik/Leanstral-TurboQuant) -- TurboQuant KV-cache variant
- [majentik/Leanstral-RotorQuant-MLX-4bit](https://huggingface.co/majentik/Leanstral-RotorQuant-MLX-4bit) -- MLX 4-bit + RotorQuant
- [majentik/Leanstral-RotorQuant-MLX-2bit](https://huggingface.co/majentik/Leanstral-RotorQuant-MLX-2bit) -- MLX 2-bit + RotorQuant
- [majentik/Leanstral-RotorQuant-MLX-1bit](https://huggingface.co/majentik/Leanstral-RotorQuant-MLX-1bit) -- MLX 1-bit + RotorQuant
- [RotorQuant repository](https://github.com/scrya-com/rotorquant)
|