Instructions to use lm-provers/QED-Nano-SFT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use lm-provers/QED-Nano-SFT with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="lm-provers/QED-Nano-SFT")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("lm-provers/QED-Nano-SFT")
model = AutoModelForCausalLM.from_pretrained("lm-provers/QED-Nano-SFT")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use lm-provers/QED-Nano-SFT with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "lm-provers/QED-Nano-SFT"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lm-provers/QED-Nano-SFT",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/lm-provers/QED-Nano-SFT

SGLang

How to use lm-provers/QED-Nano-SFT with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "lm-provers/QED-Nano-SFT" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lm-provers/QED-Nano-SFT",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "lm-provers/QED-Nano-SFT" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lm-provers/QED-Nano-SFT",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use lm-provers/QED-Nano-SFT with Docker Model Runner:
```
docker model run hf.co/lm-provers/QED-Nano-SFT
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

QED-Nano SFT

Model Summary
How to use
Evaluation
Training
Limitations
License

Model Summary

QED-Nano SFT is a 4B parameter mathematical reasoning model fine-tuned on olympiad-level proof problems. This is the supervised fine-tuning (SFT) checkpoint trained via knowledge distillation from deepseek-ai/DeepSeek-Math-V2 (685B parameters). The model generates mathematical proofs with explicit chain-of-thought reasoning enclosed in <think> tags.

This model serves as the foundation for the full QED-Nano pipeline, which includes subsequent reinforcement learning and reasoning cache extensions. For the complete trained model, see lm-provers/QED-Nano.

Key features

Reasoning model optimized for theorem proving
Supervised fine-tuning checkpoint: Foundation for the full QED-Nano model (SFT + RL + Reasoning Cache)
Distilled from DeepSeek-Math-V2 (685B): Knowledge distillation to 4B parameters
Fully open: open weights + full training details including public data mixture and training configs

For more details refer to our blog post: https://huggingface.co/spaces/lm-provers/qed-nano-blogpost

How to use

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "lm-provers/QED-Nano-SFT"
device = "cuda"  # for GPU usage or "cpu" for CPU usage

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
).to(device)

# prepare the model input
prompt = "Generate a rigorous proof to the following question: is \sqrt{2} rational or irrational?"
messages_think = [
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages_think,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# Generate the output
generated_ids = model.generate(**model_inputs, max_new_tokens=32768)

# Get and decode the output
output_ids = generated_ids[0][len(model_inputs.input_ids[0]) :]
print(tokenizer.decode(output_ids, skip_special_tokens=True))

We recommend setting temperature=0.6 and top_p=0.95 in the sampling parameters.

vLLM and SGLang

You can use vLLM and SGLang to deploy the model in an API compatible with OpenAI format.

SGLang

python -m sglang.launch_server --model-path lm-provers/QED-Nano-SFT

vLLM

vllm serve lm-provers/QED-Nano-SFT

Evaluation

In this section, we report the evaluation results of QED-Nano on IMO-ProofBench, ProofBench, and IMO-AnswerBench. All evaluations except those on IMO-AnswerBench are reported as avg@3 unless stated otherwise.

Model	IMO-ProofBench	ProofBench	IMO-AnswerBench
Qwen3-4B-Thinking-2507	20.4 (2.6)	19.5 (0.9)	55.8
QED-Nano-SFT	39.5 (2.9)	33.3 (0.5)	57.5
QED-Nano	40.0 (0.6)	44.9 (3.4)	67.5
QED-Nano (Agent)	54.0 (3.7)	54.4 (2.4)	-
Qwen3-30B-A3B-Thinking-2507	27.6 (1.0)	26.1 (2.4)	67.0
Qwen3-235B-A22B-Thinking-2507	34.1 (0.7)	33.7 (1.1)	70.5
Nomos-1	40.3 (3.5)	28.3 (3.9)	49.0
GPT-OSS-20B	38.3 (1.2)	38.4 (3.9)	61.5
GPT-OSS-120B	43.1 (3.2)	47.5 (1.7)	70.5
DeepSeek-Math-V2	57.9 (2.0)	60.6 (0.1)	75.8
Gemini 3 Pro	58.7 (2.9)	66.7 (3.1)	83.2

Note: This is the SFT-only checkpoint. Performance improves significantly with RL training and reasoning cache. See the full lm-provers/QED-Nano model for complete results.

Training

Model

Architecture: Transformer decoder
Precision: bfloat16
Base Model: Qwen/Qwen3-4B-Thinking-2507
Parameters: 4 billion
Training Data: lm-provers/FineProofs-SFT (4,300 unique problems)
Teacher Model: deepseek-ai/DeepSeek-Math-V2 (685B)

Training Hyperparameters

Optimization Steps: 620
Global Batch Size: 32
Learning Rate Schedule: Cosine with peak at 3×10⁻⁵
Max Sequence Length: 45,056 tokens
Training Configuration: Unique problems only (4.3k samples outperformed 7.7k with duplicates)

Software & hardware

GPUs: 1 node with 8 H100
Training time: 5 hours
Training Framework: TRL (Transformer Reinforcement Learning)
Evaluation framework: CMU-AIRe/QED-Nano

Training script

You can reproduce this training using the TRL library:

from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer
from trl import SFTConfig, SFTTrainer

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-4B-Thinking-2507",
    torch_dtype="auto",
    attn_implementation="kernels-community/vllm-flash-attn3",
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-4B-Thinking-2507")

# Load dataset
dataset = load_dataset("lm-provers/Olympiads-SFT", split="train")

# Training configuration
training_args = SFTConfig(
    output_dir="./qed-nano-sft",
    max_steps=620,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=2,
    learning_rate=3.0e-5,
    lr_scheduler_type="cosine",
    warmup_ratio=0.03,
    max_length=45056,
    bf16=True,
    gradient_checkpointing=True,
    logging_steps=1,
    save_steps=62,
    push_to_hub=True,
)

# Train
trainer = SFTTrainer(model=model, tokenizer=tokenizer, args=training_args, train_dataset=dataset)
trainer.train()

Limitations

QED-Nano SFT is a supervised fine-tuning checkpoint with several important limitations:

Critical Limitations

Length Explosion: This SFT-only model suffers from uncontrolled reasoning length growth. It frequently generates responses exceeding 100k tokens, often meandering or repeating steps without converging to a solution. This is a known limitation of pure supervised learning on long-form reasoning tasks.
Correctness Issues: While the model produces fluent mathematical text, it may generate incorrect proofs. The SFT stage does not directly optimize for correctness—only for imitating the teacher's reasoning style.
No Self-Correction: Unlike the full QED-Nano model with RL training, this checkpoint lacks the ability to verify and refine its solutions.

Other Limitations

Domain-specific: Designed specifically for theorem proving. Using as a general assistant will likely produce nonsense outside of this domain.
Language: English only
Text-Only: Cannot handle problems with diagrams or visual elements
Teacher Errors: May have inherited some errors from the teacher model (DeepSeek-Math-V2)

Recommended Mitigations

Use the full lm-provers/QED-Nano model for production use
Apply length penalties or early stopping to control generation length
Combine with external verification tools or judges
Consider RL fine-tuning to address correctness and length issues

These models should be used as assistive tools rather than definitive sources of information. Users should always verify important information and critically evaluate any generated content.

License

Apache 2.0

Citation

If you use this model in your research, please cite:

@model{qed_nano_sft_2026,
  title={QED-Nano SFT: Distilling Olympiad Mathematical Reasoning to 4B Parameters},
  author={Edward Beeching and Jasper Dekoninck and Jia Li and Yuxiao Qu and Amrith Setlur and Ian Wu and Aviral Kumar and Lewis Tunstall},
  year={2026},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/lm-provers/QED-Nano-SFT}}
}

Please also cite the training dataset:

@dataset{fineproofs_sft_dataset_2026,
  title={FineProofs SFT Dataset: Mathematical Olympiad Problems with Chain-of-Thought Reasoning},
  author={Edward Beeching and Jasper Dekoninck and Jia Li and Yuxiao Qu and Amrith Setlur and Ian Wu and Aviral Kumar and Lewis Tunstall},
  year={2026},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/datasets/lm-provers/FineProofs-SFT}}
}

Downloads last month: 873

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for lm-provers/QED-Nano-SFT

Base model

Qwen/Qwen3-4B-Thinking-2507

Finetuned

(234)

this model

Finetunes

1 model

Quantizations

1 model

Dataset used to train lm-provers/QED-Nano-SFT

Collection including lm-provers/QED-Nano-SFT

QED Nano

Collection

Artifacts for the QED Nano release • 9 items • Updated Mar 2 • 9

lm-provers
/

QED-Nano-SFT

QED-Nano SFT

Table of Contents

Model Summary

Key features

How to use

vLLM and SGLang

SGLang

vLLM

Evaluation

Training

Model

Training Hyperparameters

Software & hardware

Training script

Limitations

Critical Limitations

Other Limitations

Recommended Mitigations

License

Citation

Model tree for lm-provers/QED-Nano-SFT

Dataset used to train lm-provers/QED-Nano-SFT

Collection including lm-provers/QED-Nano-SFT

QED Nano