--- base_model: mistralai/Leanstral-2603 library_name: transformers tags: - rotorquant - kv-cache-quantization - leanstral - lean4 - formal-proofs - theorem-proving - quantized - mistral - moe license: apache-2.0 --- # Leanstral-RotorQuant **KV-cache quantized [Leanstral-2603](https://huggingface.co/mistralai/Leanstral-2603) using [RotorQuant](https://github.com/scrya-com/rotorquant) for high-throughput Lean 4 formal proof generation.** Leanstral is the first open-source AI agent purpose-built for Lean 4 formal proofs -- generating both executable code and machine-checkable mathematical proofs. This variant applies RotorQuant KV-cache quantization, delivering **5.3x faster prefill** and **28% faster decode** compared to TurboQuant while preserving full BF16 model weights. ## Overview This repository provides the **RotorQuant KV-cache-only** configuration of Leanstral-2603. The model weights remain at full precision; only the KV cache is quantized during inference using RotorQuant's rotation-aware quantization scheme. | Spec | Value | |------|-------| | Base model | [mistralai/Leanstral-2603](https://huggingface.co/mistralai/Leanstral-2603) | | Architecture | Mistral MoE (~119B parameters, 7 consolidated shards) | | Compression | RotorQuant KV-cache quantization | | Weight precision | BF16 (unmodified) | | KV-cache precision | Mixed-precision quantized | | Prefill speedup | 5.3x vs TurboQuant | | Decode speedup | 28% vs TurboQuant | | License | Apache 2.0 | | Use case | Lean 4 formal verification, theorem proving, mathematical proofs | ## Quickstart ```python from transformers import AutoModelForCausalLM, AutoTokenizer from turboquant import IsoQuantCache model_id = "majentik/Leanstral-RotorQuant" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto", torch_dtype="auto", ) # Enable RotorQuant KV-cache quantization cache = IsoQuantCache(model) prompt = "Prove that for all natural numbers n, n + 0 = n in Lean 4:" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate( **inputs, max_new_tokens=512, past_key_values=cache, ) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## What is RotorQuant? [RotorQuant](https://github.com/scrya-com/rotorquant) is an advanced KV-cache quantization method that leverages rotation-aware quantization to achieve superior throughput compared to standard KV-cache compression. By exploiting the rotary positional embedding structure, RotorQuant achieves: - **5.3x faster prefill** -- critical for long Lean 4 proof contexts - **28% faster decode** -- faster token-by-token proof generation - Equivalent memory savings to TurboQuant with better computational efficiency This makes RotorQuant the preferred choice for interactive theorem proving sessions where latency matters. ## Memory Estimates | Component | Estimate | |-----------|----------| | Model weights (BF16) | ~238 GB | | KV-cache savings | 2-4x reduction vs FP16 KV cache | | Recommended VRAM | 4x A100 80GB or equivalent | ## Lean 4 Use Case Leanstral excels at: - **Formal verification** -- generating machine-checkable proofs of mathematical theorems - **Theorem proving** -- interactive and automated proof search in Lean 4 - **Code generation** -- writing verified Lean 4 programs with correctness guarantees - **Proof repair** -- fixing incomplete or broken proof scripts ## See Also - [mistralai/Leanstral-2603](https://huggingface.co/mistralai/Leanstral-2603) -- Base model - [majentik/Leanstral-TurboQuant](https://huggingface.co/majentik/Leanstral-TurboQuant) -- TurboQuant KV-cache variant - [majentik/Leanstral-RotorQuant-MLX-4bit](https://huggingface.co/majentik/Leanstral-RotorQuant-MLX-4bit) -- MLX 4-bit + RotorQuant - [majentik/Leanstral-RotorQuant-MLX-2bit](https://huggingface.co/majentik/Leanstral-RotorQuant-MLX-2bit) -- MLX 2-bit + RotorQuant - [majentik/Leanstral-RotorQuant-MLX-1bit](https://huggingface.co/majentik/Leanstral-RotorQuant-MLX-1bit) -- MLX 1-bit + RotorQuant - [RotorQuant repository](https://github.com/scrya-com/rotorquant)