Add model card (weights pending mlx_lm mistral3 architecture support)

Browse files

Files changed (1) hide show

README.md +94 -0

README.md ADDED Viewed

	@@ -0,0 +1,94 @@

+---
+base_model: mistralai/Leanstral-2603
+library_name: mlx
+tags:
+  - rotorquant
+  - kv-cache-quantization
+  - mlx
+  - 2-bit
+  - weight-quantization
+  - leanstral
+  - lean4
+  - formal-proofs
+  - theorem-proving
+  - quantized
+  - apple-silicon
+  - mistral
+  - moe
+license: apache-2.0
+---
+# Leanstral-RotorQuant-MLX-2bit
+**2-bit MLX weight-quantized [Leanstral-2603](https://huggingface.co/mistralai/Leanstral-2603) with [RotorQuant](https://github.com/scrya-com/rotorquant) KV-cache quantization for high-throughput Lean 4 formal proof generation on Apple Silicon.**
+Leanstral is the first open-source AI agent purpose-built for Lean 4 formal proofs -- generating both executable code and machine-checkable mathematical proofs. This variant combines **dual compression**: 2-bit MLX weight quantization for aggressive model size reduction plus RotorQuant KV-cache quantization, delivering **5.3x faster prefill** and **28% faster decode** compared to TurboQuant equivalents.
+## Overview
+This repository provides an aggressively compressed configuration with RotorQuant's superior throughput: MLX 2-bit weight quantization minimizes the static memory footprint, while RotorQuant's rotation-aware KV-cache compression delivers faster prefill and decode than TurboQuant.
+| Spec | Value |
+|------|-------|
+| Base model | [mistralai/Leanstral-2603](https://huggingface.co/mistralai/Leanstral-2603) |
+| Architecture | Mistral MoE (~119B parameters, 7 consolidated shards) |
+| Weight quantization | 2-bit (MLX) |
+| KV-cache quantization | RotorQuant |
+| Weight memory | ~30 GB |
+| Prefill speedup | 5.3x vs TurboQuant |
+| Decode speedup | 28% vs TurboQuant |
+| Runtime | MLX (Apple Silicon) |
+| License | Apache 2.0 |
+| Use case | Lean 4 formal verification, theorem proving, mathematical proofs |
+## Quickstart
+```python
+from mlx_lm import load, generate
+model, tokenizer = load("majentik/Leanstral-RotorQuant-MLX-2bit")
+prompt = "Prove that for all natural numbers n, n + 0 = n in Lean 4:"
+response = generate(
+    model,
+    tokenizer,
+    prompt=prompt,
+    max_tokens=512,
+)
+print(response)
+```
+## What is RotorQuant?
+[RotorQuant](https://github.com/scrya-com/rotorquant) is an advanced KV-cache quantization method that leverages rotation-aware quantization to achieve superior throughput compared to standard KV-cache compression. By exploiting the rotary positional embedding structure, RotorQuant achieves:
+- **5.3x faster prefill** -- critical for long Lean 4 proof contexts
+- **28% faster decode** -- faster token-by-token proof generation
+- Equivalent memory savings to TurboQuant with better computational efficiency
+> **Note:** 2-bit weight quantization is lossy. Expect some degradation in proof quality compared to the 4-bit variant. For critical formal verification work, prefer the 4-bit or full-precision variants.
+## Memory Estimates
+| Component | Estimate |
+|-----------|----------|
+| Model weights (2-bit) | ~30 GB |
+| KV-cache | Reduced via RotorQuant |
+| Recommended hardware | MacBook Pro M2/M3/M4 Max (64 GB+) or Mac Studio |
+## Lean 4 Use Case
+Leanstral excels at:
+- **Formal verification** -- generating machine-checkable proofs of mathematical theorems
+- **Theorem proving** -- interactive and automated proof search in Lean 4
+- **Code generation** -- writing verified Lean 4 programs with correctness guarantees
+- **Proof repair** -- fixing incomplete or broken proof scripts
+## See Also
+- [mistralai/Leanstral-2603](https://huggingface.co/mistralai/Leanstral-2603) -- Base model
+- [majentik/Leanstral-RotorQuant](https://huggingface.co/majentik/Leanstral-RotorQuant) -- Full-precision weights + RotorQuant KV cache
+- [majentik/Leanstral-RotorQuant-MLX-4bit](https://huggingface.co/majentik/Leanstral-RotorQuant-MLX-4bit) -- MLX 4-bit + RotorQuant
+- [majentik/Leanstral-RotorQuant-MLX-1bit](https://huggingface.co/majentik/Leanstral-RotorQuant-MLX-1bit) -- MLX 1-bit + RotorQuant
+- [majentik/Leanstral-TurboQuant-MLX-2bit](https://huggingface.co/majentik/Leanstral-TurboQuant-MLX-2bit) -- MLX 2-bit + TurboQuant
+- [RotorQuant repository](https://github.com/scrya-com/rotorquant)