Leanstral-TurboQuant

KV-cache quantized Leanstral-2603 using TurboQuant for efficient Lean 4 formal proof generation.

Leanstral is the first open-source AI agent purpose-built for Lean 4 formal proofs -- generating both executable code and machine-checkable mathematical proofs. This variant applies TurboQuant KV-cache quantization to drastically reduce memory consumption during inference while preserving the full BF16 model weights.

Overview

This repository provides the TurboQuant KV-cache-only configuration of Leanstral-2603. The model weights remain at full precision; only the KV cache is quantized during inference, reducing peak memory usage for long-context Lean 4 proof generation sessions.

Spec Value
Base model mistralai/Leanstral-2603
Architecture Mistral MoE (~119B parameters, 7 consolidated shards)
Compression TurboQuant KV-cache quantization
Weight precision BF16 (unmodified)
KV-cache precision Mixed-precision quantized
License Apache 2.0
Use case Lean 4 formal verification, theorem proving, mathematical proofs

Quickstart

from transformers import AutoModelForCausalLM, AutoTokenizer
from turboquant import TurboQuantCache

model_id = "majentik/Leanstral-TurboQuant"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype="auto",
)

# Enable TurboQuant KV-cache quantization
cache = TurboQuantCache(model)

prompt = "Prove that for all natural numbers n, n + 0 = n in Lean 4:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    past_key_values=cache,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

What is TurboQuant?

TurboQuant (arXiv: 2504.19874) is a KV-cache quantization method that compresses the key-value cache used during autoregressive generation. By quantizing the KV cache to lower precision, TurboQuant reduces memory consumption proportionally to context length without modifying the model weights themselves. This is particularly beneficial for Lean 4 proof generation, where proofs can span thousands of tokens.

Memory Estimates

Component Estimate
Model weights (BF16) ~238 GB
KV-cache savings 2-4x reduction vs FP16 KV cache
Recommended VRAM 4x A100 80GB or equivalent

Lean 4 Use Case

Leanstral excels at:

  • Formal verification -- generating machine-checkable proofs of mathematical theorems
  • Theorem proving -- interactive and automated proof search in Lean 4
  • Code generation -- writing verified Lean 4 programs with correctness guarantees
  • Proof repair -- fixing incomplete or broken proof scripts

See Also

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for majentik/Leanstral-TurboQuant

Finetuned
(2)
this model

Paper for majentik/Leanstral-TurboQuant