majentik commited on
Commit
feab54a
·
verified ·
1 Parent(s): 84d63ee

Add model card

Browse files
Files changed (1) hide show
  1. README.md +101 -0
README.md ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: mistralai/Leanstral-2603
3
+ library_name: transformers
4
+ tags:
5
+ - rotorquant
6
+ - kv-cache-quantization
7
+ - leanstral
8
+ - lean4
9
+ - formal-proofs
10
+ - theorem-proving
11
+ - quantized
12
+ - mistral
13
+ - moe
14
+ license: apache-2.0
15
+ ---
16
+
17
+ # Leanstral-RotorQuant
18
+
19
+ **KV-cache quantized [Leanstral-2603](https://huggingface.co/mistralai/Leanstral-2603) using [RotorQuant](https://github.com/scrya-com/rotorquant) for high-throughput Lean 4 formal proof generation.**
20
+
21
+ Leanstral is the first open-source AI agent purpose-built for Lean 4 formal proofs -- generating both executable code and machine-checkable mathematical proofs. This variant applies RotorQuant KV-cache quantization, delivering **5.3x faster prefill** and **28% faster decode** compared to TurboQuant while preserving full BF16 model weights.
22
+
23
+ ## Overview
24
+
25
+ This repository provides the **RotorQuant KV-cache-only** configuration of Leanstral-2603. The model weights remain at full precision; only the KV cache is quantized during inference using RotorQuant's rotation-aware quantization scheme.
26
+
27
+ | Spec | Value |
28
+ |------|-------|
29
+ | Base model | [mistralai/Leanstral-2603](https://huggingface.co/mistralai/Leanstral-2603) |
30
+ | Architecture | Mistral MoE (~119B parameters, 7 consolidated shards) |
31
+ | Compression | RotorQuant KV-cache quantization |
32
+ | Weight precision | BF16 (unmodified) |
33
+ | KV-cache precision | Mixed-precision quantized |
34
+ | Prefill speedup | 5.3x vs TurboQuant |
35
+ | Decode speedup | 28% vs TurboQuant |
36
+ | License | Apache 2.0 |
37
+ | Use case | Lean 4 formal verification, theorem proving, mathematical proofs |
38
+
39
+ ## Quickstart
40
+
41
+ ```python
42
+ from transformers import AutoModelForCausalLM, AutoTokenizer
43
+ from turboquant import IsoQuantCache
44
+
45
+ model_id = "majentik/Leanstral-RotorQuant"
46
+
47
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
48
+ model = AutoModelForCausalLM.from_pretrained(
49
+ model_id,
50
+ device_map="auto",
51
+ torch_dtype="auto",
52
+ )
53
+
54
+ # Enable RotorQuant KV-cache quantization
55
+ cache = IsoQuantCache(model)
56
+
57
+ prompt = "Prove that for all natural numbers n, n + 0 = n in Lean 4:"
58
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
59
+
60
+ outputs = model.generate(
61
+ **inputs,
62
+ max_new_tokens=512,
63
+ past_key_values=cache,
64
+ )
65
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
66
+ ```
67
+
68
+ ## What is RotorQuant?
69
+
70
+ [RotorQuant](https://github.com/scrya-com/rotorquant) is an advanced KV-cache quantization method that leverages rotation-aware quantization to achieve superior throughput compared to standard KV-cache compression. By exploiting the rotary positional embedding structure, RotorQuant achieves:
71
+
72
+ - **5.3x faster prefill** -- critical for long Lean 4 proof contexts
73
+ - **28% faster decode** -- faster token-by-token proof generation
74
+ - Equivalent memory savings to TurboQuant with better computational efficiency
75
+
76
+ This makes RotorQuant the preferred choice for interactive theorem proving sessions where latency matters.
77
+
78
+ ## Memory Estimates
79
+
80
+ | Component | Estimate |
81
+ |-----------|----------|
82
+ | Model weights (BF16) | ~238 GB |
83
+ | KV-cache savings | 2-4x reduction vs FP16 KV cache |
84
+ | Recommended VRAM | 4x A100 80GB or equivalent |
85
+
86
+ ## Lean 4 Use Case
87
+
88
+ Leanstral excels at:
89
+ - **Formal verification** -- generating machine-checkable proofs of mathematical theorems
90
+ - **Theorem proving** -- interactive and automated proof search in Lean 4
91
+ - **Code generation** -- writing verified Lean 4 programs with correctness guarantees
92
+ - **Proof repair** -- fixing incomplete or broken proof scripts
93
+
94
+ ## See Also
95
+
96
+ - [mistralai/Leanstral-2603](https://huggingface.co/mistralai/Leanstral-2603) -- Base model
97
+ - [majentik/Leanstral-TurboQuant](https://huggingface.co/majentik/Leanstral-TurboQuant) -- TurboQuant KV-cache variant
98
+ - [majentik/Leanstral-RotorQuant-MLX-4bit](https://huggingface.co/majentik/Leanstral-RotorQuant-MLX-4bit) -- MLX 4-bit + RotorQuant
99
+ - [majentik/Leanstral-RotorQuant-MLX-2bit](https://huggingface.co/majentik/Leanstral-RotorQuant-MLX-2bit) -- MLX 2-bit + RotorQuant
100
+ - [majentik/Leanstral-RotorQuant-MLX-1bit](https://huggingface.co/majentik/Leanstral-RotorQuant-MLX-1bit) -- MLX 1-bit + RotorQuant
101
+ - [RotorQuant repository](https://github.com/scrya-com/rotorquant)