File size: 4,180 Bytes
feab54a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
---
base_model: mistralai/Leanstral-2603
library_name: transformers
tags:
  - rotorquant
  - kv-cache-quantization
  - leanstral
  - lean4
  - formal-proofs
  - theorem-proving
  - quantized
  - mistral
  - moe
license: apache-2.0
---

# Leanstral-RotorQuant

**KV-cache quantized [Leanstral-2603](https://huggingface.co/mistralai/Leanstral-2603) using [RotorQuant](https://github.com/scrya-com/rotorquant) for high-throughput Lean 4 formal proof generation.**

Leanstral is the first open-source AI agent purpose-built for Lean 4 formal proofs -- generating both executable code and machine-checkable mathematical proofs. This variant applies RotorQuant KV-cache quantization, delivering **5.3x faster prefill** and **28% faster decode** compared to TurboQuant while preserving full BF16 model weights.

## Overview

This repository provides the **RotorQuant KV-cache-only** configuration of Leanstral-2603. The model weights remain at full precision; only the KV cache is quantized during inference using RotorQuant's rotation-aware quantization scheme.

| Spec | Value |
|------|-------|
| Base model | [mistralai/Leanstral-2603](https://huggingface.co/mistralai/Leanstral-2603) |
| Architecture | Mistral MoE (~119B parameters, 7 consolidated shards) |
| Compression | RotorQuant KV-cache quantization |
| Weight precision | BF16 (unmodified) |
| KV-cache precision | Mixed-precision quantized |
| Prefill speedup | 5.3x vs TurboQuant |
| Decode speedup | 28% vs TurboQuant |
| License | Apache 2.0 |
| Use case | Lean 4 formal verification, theorem proving, mathematical proofs |

## Quickstart

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from turboquant import IsoQuantCache

model_id = "majentik/Leanstral-RotorQuant"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype="auto",
)

# Enable RotorQuant KV-cache quantization
cache = IsoQuantCache(model)

prompt = "Prove that for all natural numbers n, n + 0 = n in Lean 4:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    past_key_values=cache,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## What is RotorQuant?

[RotorQuant](https://github.com/scrya-com/rotorquant) is an advanced KV-cache quantization method that leverages rotation-aware quantization to achieve superior throughput compared to standard KV-cache compression. By exploiting the rotary positional embedding structure, RotorQuant achieves:

- **5.3x faster prefill** -- critical for long Lean 4 proof contexts
- **28% faster decode** -- faster token-by-token proof generation
- Equivalent memory savings to TurboQuant with better computational efficiency

This makes RotorQuant the preferred choice for interactive theorem proving sessions where latency matters.

## Memory Estimates

| Component | Estimate |
|-----------|----------|
| Model weights (BF16) | ~238 GB |
| KV-cache savings | 2-4x reduction vs FP16 KV cache |
| Recommended VRAM | 4x A100 80GB or equivalent |

## Lean 4 Use Case

Leanstral excels at:
- **Formal verification** -- generating machine-checkable proofs of mathematical theorems
- **Theorem proving** -- interactive and automated proof search in Lean 4
- **Code generation** -- writing verified Lean 4 programs with correctness guarantees
- **Proof repair** -- fixing incomplete or broken proof scripts

## See Also

- [mistralai/Leanstral-2603](https://huggingface.co/mistralai/Leanstral-2603) -- Base model
- [majentik/Leanstral-TurboQuant](https://huggingface.co/majentik/Leanstral-TurboQuant) -- TurboQuant KV-cache variant
- [majentik/Leanstral-RotorQuant-MLX-4bit](https://huggingface.co/majentik/Leanstral-RotorQuant-MLX-4bit) -- MLX 4-bit + RotorQuant
- [majentik/Leanstral-RotorQuant-MLX-2bit](https://huggingface.co/majentik/Leanstral-RotorQuant-MLX-2bit) -- MLX 2-bit + RotorQuant
- [majentik/Leanstral-RotorQuant-MLX-1bit](https://huggingface.co/majentik/Leanstral-RotorQuant-MLX-1bit) -- MLX 1-bit + RotorQuant
- [RotorQuant repository](https://github.com/scrya-com/rotorquant)