majentik commited on
Commit
642eaf9
·
verified ·
1 Parent(s): af6ecc8

Add model card (weights pending mlx_lm mistral3 architecture support)

Browse files
Files changed (1) hide show
  1. README.md +88 -0
README.md ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: mistralai/Leanstral-2603
3
+ library_name: mlx
4
+ tags:
5
+ - turboquant
6
+ - kv-cache-quantization
7
+ - mlx
8
+ - 2-bit
9
+ - weight-quantization
10
+ - leanstral
11
+ - lean4
12
+ - formal-proofs
13
+ - theorem-proving
14
+ - quantized
15
+ - apple-silicon
16
+ - mistral
17
+ - moe
18
+ license: apache-2.0
19
+ ---
20
+
21
+ # Leanstral-TurboQuant-MLX-2bit
22
+
23
+ **2-bit MLX weight-quantized [Leanstral-2603](https://huggingface.co/mistralai/Leanstral-2603) with [TurboQuant](https://arxiv.org/abs/2504.19874) KV-cache quantization for Lean 4 formal proof generation on Apple Silicon.**
24
+
25
+ Leanstral is the first open-source AI agent purpose-built for Lean 4 formal proofs -- generating both executable code and machine-checkable mathematical proofs. This variant combines **dual compression**: 2-bit MLX weight quantization for aggressive model size reduction plus TurboQuant KV-cache quantization for efficient long-context inference.
26
+
27
+ ## Overview
28
+
29
+ This repository provides a **dual-compressed** configuration: MLX 2-bit weight quantization aggressively reduces the static memory footprint, while TurboQuant compresses the KV cache at runtime. This enables running Leanstral on Apple Silicon machines with moderate unified memory.
30
+
31
+ | Spec | Value |
32
+ |------|-------|
33
+ | Base model | [mistralai/Leanstral-2603](https://huggingface.co/mistralai/Leanstral-2603) |
34
+ | Architecture | Mistral MoE (~119B parameters, 7 consolidated shards) |
35
+ | Weight quantization | 2-bit (MLX) |
36
+ | KV-cache quantization | TurboQuant |
37
+ | Weight memory | ~30 GB |
38
+ | Runtime | MLX (Apple Silicon) |
39
+ | License | Apache 2.0 |
40
+ | Use case | Lean 4 formal verification, theorem proving, mathematical proofs |
41
+
42
+ ## Quickstart
43
+
44
+ ```python
45
+ from mlx_lm import load, generate
46
+
47
+ model, tokenizer = load("majentik/Leanstral-TurboQuant-MLX-2bit")
48
+
49
+ prompt = "Prove that for all natural numbers n, n + 0 = n in Lean 4:"
50
+ response = generate(
51
+ model,
52
+ tokenizer,
53
+ prompt=prompt,
54
+ max_tokens=512,
55
+ )
56
+ print(response)
57
+ ```
58
+
59
+ ## What is TurboQuant?
60
+
61
+ TurboQuant ([arXiv: 2504.19874](https://arxiv.org/abs/2504.19874)) is a KV-cache quantization method that compresses the key-value cache used during autoregressive generation. By quantizing the KV cache to lower precision, TurboQuant reduces memory consumption proportionally to context length. Combined with MLX 2-bit weight quantization, this aggressive dual compression makes it possible to fit Leanstral's ~119B parameter model into ~30 GB of unified memory.
62
+
63
+ > **Note:** 2-bit weight quantization is lossy. Expect some degradation in proof quality compared to the 4-bit variant. For critical formal verification work, prefer the 4-bit or full-precision variants.
64
+
65
+ ## Memory Estimates
66
+
67
+ | Component | Estimate |
68
+ |-----------|----------|
69
+ | Model weights (2-bit) | ~30 GB |
70
+ | KV-cache | Reduced via TurboQuant |
71
+ | Recommended hardware | MacBook Pro M2/M3/M4 Max (64 GB+) or Mac Studio |
72
+
73
+ ## Lean 4 Use Case
74
+
75
+ Leanstral excels at:
76
+ - **Formal verification** -- generating machine-checkable proofs of mathematical theorems
77
+ - **Theorem proving** -- interactive and automated proof search in Lean 4
78
+ - **Code generation** -- writing verified Lean 4 programs with correctness guarantees
79
+ - **Proof repair** -- fixing incomplete or broken proof scripts
80
+
81
+ ## See Also
82
+
83
+ - [mistralai/Leanstral-2603](https://huggingface.co/mistralai/Leanstral-2603) -- Base model
84
+ - [majentik/Leanstral-TurboQuant](https://huggingface.co/majentik/Leanstral-TurboQuant) -- Full-precision weights + TurboQuant KV cache
85
+ - [majentik/Leanstral-TurboQuant-MLX-4bit](https://huggingface.co/majentik/Leanstral-TurboQuant-MLX-4bit) -- MLX 4-bit + TurboQuant
86
+ - [majentik/Leanstral-TurboQuant-MLX-1bit](https://huggingface.co/majentik/Leanstral-TurboQuant-MLX-1bit) -- MLX 1-bit + TurboQuant
87
+ - [majentik/Leanstral-RotorQuant-MLX-2bit](https://huggingface.co/majentik/Leanstral-RotorQuant-MLX-2bit) -- MLX 2-bit + RotorQuant (faster prefill/decode)
88
+ - [TurboQuant paper](https://arxiv.org/abs/2504.19874)