Add model card

feab54a verified 3 days ago

4.18 kB

	---
	base_model: mistralai/Leanstral-2603
	library_name: transformers
	tags:
	- rotorquant
	- kv-cache-quantization
	- leanstral
	- lean4
	- formal-proofs
	- theorem-proving
	- quantized
	- mistral
	- moe
	license: apache-2.0
	---

	# Leanstral-RotorQuant

	KV-cache quantized [Leanstral-2603](https://huggingface.co/mistralai/Leanstral-2603) using [RotorQuant](https://github.com/scrya-com/rotorquant) for high-throughput Lean 4 formal proof generation.

	Leanstral is the first open-source AI agent purpose-built for Lean 4 formal proofs -- generating both executable code and machine-checkable mathematical proofs. This variant applies RotorQuant KV-cache quantization, delivering 5.3x faster prefill and 28% faster decode compared to TurboQuant while preserving full BF16 model weights.

	## Overview

	This repository provides the RotorQuant KV-cache-only configuration of Leanstral-2603. The model weights remain at full precision; only the KV cache is quantized during inference using RotorQuant's rotation-aware quantization scheme.

	\| Spec \| Value \|
	\|------\|-------\|
	\| Base model \| [mistralai/Leanstral-2603](https://huggingface.co/mistralai/Leanstral-2603) \|
	\| Architecture \| Mistral MoE (~119B parameters, 7 consolidated shards) \|
	\| Compression \| RotorQuant KV-cache quantization \|
	\| Weight precision \| BF16 (unmodified) \|
	\| KV-cache precision \| Mixed-precision quantized \|
	\| Prefill speedup \| 5.3x vs TurboQuant \|
	\| Decode speedup \| 28% vs TurboQuant \|
	\| License \| Apache 2.0 \|
	\| Use case \| Lean 4 formal verification, theorem proving, mathematical proofs \|

	## Quickstart

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from turboquant import IsoQuantCache

	model_id = "majentik/Leanstral-RotorQuant"

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	device_map="auto",
	torch_dtype="auto",
	)

	# Enable RotorQuant KV-cache quantization
	cache = IsoQuantCache(model)

	prompt = "Prove that for all natural numbers n, n + 0 = n in Lean 4:"
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

	outputs = model.generate(
	**inputs,
	max_new_tokens=512,
	past_key_values=cache,
	)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	## What is RotorQuant?

	[RotorQuant](https://github.com/scrya-com/rotorquant) is an advanced KV-cache quantization method that leverages rotation-aware quantization to achieve superior throughput compared to standard KV-cache compression. By exploiting the rotary positional embedding structure, RotorQuant achieves:

	- 5.3x faster prefill -- critical for long Lean 4 proof contexts
	- 28% faster decode -- faster token-by-token proof generation
	- Equivalent memory savings to TurboQuant with better computational efficiency

	This makes RotorQuant the preferred choice for interactive theorem proving sessions where latency matters.

	## Memory Estimates

	\| Component \| Estimate \|
	\|-----------\|----------\|
	\| Model weights (BF16) \| ~238 GB \|
	\| KV-cache savings \| 2-4x reduction vs FP16 KV cache \|
	\| Recommended VRAM \| 4x A100 80GB or equivalent \|

	## Lean 4 Use Case

	Leanstral excels at:
	- Formal verification -- generating machine-checkable proofs of mathematical theorems
	- Theorem proving -- interactive and automated proof search in Lean 4
	- Code generation -- writing verified Lean 4 programs with correctness guarantees
	- Proof repair -- fixing incomplete or broken proof scripts

	## See Also

	- [mistralai/Leanstral-2603](https://huggingface.co/mistralai/Leanstral-2603) -- Base model
	- [majentik/Leanstral-TurboQuant](https://huggingface.co/majentik/Leanstral-TurboQuant) -- TurboQuant KV-cache variant
	- [majentik/Leanstral-RotorQuant-MLX-4bit](https://huggingface.co/majentik/Leanstral-RotorQuant-MLX-4bit) -- MLX 4-bit + RotorQuant
	- [majentik/Leanstral-RotorQuant-MLX-2bit](https://huggingface.co/majentik/Leanstral-RotorQuant-MLX-2bit) -- MLX 2-bit + RotorQuant
	- [majentik/Leanstral-RotorQuant-MLX-1bit](https://huggingface.co/majentik/Leanstral-RotorQuant-MLX-1bit) -- MLX 1-bit + RotorQuant
	- [RotorQuant repository](https://github.com/scrya-com/rotorquant)