Leanstral-TurboQuant-MLX-2bit

2-bit MLX weight-quantized Leanstral-2603 with TurboQuant KV-cache quantization for Lean 4 formal proof generation on Apple Silicon.

Leanstral is the first open-source AI agent purpose-built for Lean 4 formal proofs -- generating both executable code and machine-checkable mathematical proofs. This variant combines dual compression: 2-bit MLX weight quantization for aggressive model size reduction plus TurboQuant KV-cache quantization for efficient long-context inference.

Overview

This repository provides a dual-compressed configuration: MLX 2-bit weight quantization aggressively reduces the static memory footprint, while TurboQuant compresses the KV cache at runtime. This enables running Leanstral on Apple Silicon machines with moderate unified memory.

Spec Value
Base model mistralai/Leanstral-2603
Architecture Mistral MoE (~119B parameters, 7 consolidated shards)
Weight quantization 2-bit (MLX)
KV-cache quantization TurboQuant
Weight memory ~30 GB
Runtime MLX (Apple Silicon)
License Apache 2.0
Use case Lean 4 formal verification, theorem proving, mathematical proofs

Quickstart

from mlx_lm import load, generate

model, tokenizer = load("majentik/Leanstral-TurboQuant-MLX-2bit")

prompt = "Prove that for all natural numbers n, n + 0 = n in Lean 4:"
response = generate(
    model,
    tokenizer,
    prompt=prompt,
    max_tokens=512,
)
print(response)

What is TurboQuant?

TurboQuant (arXiv: 2504.19874) is a KV-cache quantization method that compresses the key-value cache used during autoregressive generation. By quantizing the KV cache to lower precision, TurboQuant reduces memory consumption proportionally to context length. Combined with MLX 2-bit weight quantization, this aggressive dual compression makes it possible to fit Leanstral's ~119B parameter model into ~30 GB of unified memory.

Note: 2-bit weight quantization is lossy. Expect some degradation in proof quality compared to the 4-bit variant. For critical formal verification work, prefer the 4-bit or full-precision variants.

Memory Estimates

Component Estimate
Model weights (2-bit) ~30 GB
KV-cache Reduced via TurboQuant
Recommended hardware MacBook Pro M2/M3/M4 Max (64 GB+) or Mac Studio

Lean 4 Use Case

Leanstral excels at:

  • Formal verification -- generating machine-checkable proofs of mathematical theorems
  • Theorem proving -- interactive and automated proof search in Lean 4
  • Code generation -- writing verified Lean 4 programs with correctness guarantees
  • Proof repair -- fixing incomplete or broken proof scripts

See Also

Downloads last month
28
Safetensors
Model size
23B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

2-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for majentik/Leanstral-TurboQuant-MLX-2bit

Quantized
(8)
this model

Paper for majentik/Leanstral-TurboQuant-MLX-2bit