Add model card (weights pending mlx_lm mistral3 architecture support)

Browse files

Files changed (1) hide show

README.md +88 -0

README.md ADDED Viewed

	@@ -0,0 +1,88 @@

+---
+base_model: mistralai/Leanstral-2603
+library_name: mlx
+tags:
+  - turboquant
+  - kv-cache-quantization
+  - mlx
+  - 2-bit
+  - weight-quantization
+  - leanstral
+  - lean4
+  - formal-proofs
+  - theorem-proving
+  - quantized
+  - apple-silicon
+  - mistral
+  - moe
+license: apache-2.0
+---
+# Leanstral-TurboQuant-MLX-2bit
+**2-bit MLX weight-quantized [Leanstral-2603](https://huggingface.co/mistralai/Leanstral-2603) with [TurboQuant](https://arxiv.org/abs/2504.19874) KV-cache quantization for Lean 4 formal proof generation on Apple Silicon.**
+Leanstral is the first open-source AI agent purpose-built for Lean 4 formal proofs -- generating both executable code and machine-checkable mathematical proofs. This variant combines **dual compression**: 2-bit MLX weight quantization for aggressive model size reduction plus TurboQuant KV-cache quantization for efficient long-context inference.
+## Overview
+This repository provides a **dual-compressed** configuration: MLX 2-bit weight quantization aggressively reduces the static memory footprint, while TurboQuant compresses the KV cache at runtime. This enables running Leanstral on Apple Silicon machines with moderate unified memory.
+| Spec | Value |
+|------|-------|
+| Base model | [mistralai/Leanstral-2603](https://huggingface.co/mistralai/Leanstral-2603) |
+| Architecture | Mistral MoE (~119B parameters, 7 consolidated shards) |
+| Weight quantization | 2-bit (MLX) |
+| KV-cache quantization | TurboQuant |
+| Weight memory | ~30 GB |
+| Runtime | MLX (Apple Silicon) |
+| License | Apache 2.0 |
+| Use case | Lean 4 formal verification, theorem proving, mathematical proofs |
+## Quickstart
+```python
+from mlx_lm import load, generate
+model, tokenizer = load("majentik/Leanstral-TurboQuant-MLX-2bit")
+prompt = "Prove that for all natural numbers n, n + 0 = n in Lean 4:"
+response = generate(
+    model,
+    tokenizer,
+    prompt=prompt,
+    max_tokens=512,
+)
+print(response)
+```
+## What is TurboQuant?
+TurboQuant ([arXiv: 2504.19874](https://arxiv.org/abs/2504.19874)) is a KV-cache quantization method that compresses the key-value cache used during autoregressive generation. By quantizing the KV cache to lower precision, TurboQuant reduces memory consumption proportionally to context length. Combined with MLX 2-bit weight quantization, this aggressive dual compression makes it possible to fit Leanstral's ~119B parameter model into ~30 GB of unified memory.
+> **Note:** 2-bit weight quantization is lossy. Expect some degradation in proof quality compared to the 4-bit variant. For critical formal verification work, prefer the 4-bit or full-precision variants.
+## Memory Estimates
+| Component | Estimate |
+|-----------|----------|
+| Model weights (2-bit) | ~30 GB |
+| KV-cache | Reduced via TurboQuant |
+| Recommended hardware | MacBook Pro M2/M3/M4 Max (64 GB+) or Mac Studio |
+## Lean 4 Use Case
+Leanstral excels at:
+- **Formal verification** -- generating machine-checkable proofs of mathematical theorems
+- **Theorem proving** -- interactive and automated proof search in Lean 4
+- **Code generation** -- writing verified Lean 4 programs with correctness guarantees
+- **Proof repair** -- fixing incomplete or broken proof scripts
+## See Also
+- [mistralai/Leanstral-2603](https://huggingface.co/mistralai/Leanstral-2603) -- Base model
+- [majentik/Leanstral-TurboQuant](https://huggingface.co/majentik/Leanstral-TurboQuant) -- Full-precision weights + TurboQuant KV cache
+- [majentik/Leanstral-TurboQuant-MLX-4bit](https://huggingface.co/majentik/Leanstral-TurboQuant-MLX-4bit) -- MLX 4-bit + TurboQuant
+- [majentik/Leanstral-TurboQuant-MLX-1bit](https://huggingface.co/majentik/Leanstral-TurboQuant-MLX-1bit) -- MLX 1-bit + TurboQuant
+- [majentik/Leanstral-RotorQuant-MLX-2bit](https://huggingface.co/majentik/Leanstral-RotorQuant-MLX-2bit) -- MLX 2-bit + RotorQuant (faster prefill/decode)
+- [TurboQuant paper](https://arxiv.org/abs/2504.19874)