gemma-2-9b-proof-lora

A LoRA adapter fine-tuned on top of google/gemma-2-9b-it for formal mathematical proof writing. Given a theorem or mathematical statement, the model produces a rigorous, step-by-step proof with clearly labelled structure (Given β†’ Definitions β†’ Proof β†’ QED).

The base model could write vague, hand-wavy math. After fine-tuning it writes structured, logically explicit proofs.


Model Details

  • Developed by: Kunj Shah (@kunjcr2)
  • Model type: Causal LM β€” LoRA adapter (PEFT)
  • Base model: google/gemma-2-9b-it
  • Language: English
  • Finetuned from: google/gemma-2-9b-it
  • Training logs: W&B Run

What Changed vs Base Model

Before fine-tuning, Gemma-2-9B-IT produces informal, conversational math responses when asked to prove something β€” it gestures at the right idea but skips logical steps and has no consistent structure.

After fine-tuning, the model writes proofs with:

  • Explicit logical steps with clear connectives ("therefore", "it follows that", "by definition")
  • Proper proof structure (Given, Proof, QED)
  • No skipped justifications
  • Correct proof termination

This is a style and structure shift, not a reasoning capability shift. The model already knew the math β€” SFT made the reasoning visible and explicit.


Training Details

Dataset

  • Source: open-web-math/open-web-math β€” 6.3M documents of high-quality mathematical text (forums, lecture notes, arXiv, math blogs) with LaTeX preserved
  • Filtered to: ~120,000 proof-shaped samples containing both theorem keywords (Theorem, Lemma, Proposition) and proof markers (Proof., QED, \square)
  • Format: Gemma-2-IT chat format with completion-only loss β€” model supervised only on the proof, not the prompt

Hyperparameters

Epochs 3
Effective batch size 32 (batch 2 Γ— grad accum 16)
Learning rate 1.5e-4
LR scheduler Cosine
Warmup ratio 0.05
Weight decay 0.01
Max sequence length 1536
Optimizer Paged AdamW 32-bit
Precision bf16 + tf32
Hardware NVIDIA A100-SXM4-80GB
Training time 9.59 hours

LoRA Config

Rank (r) 32
Alpha 64
Dropout 0.05
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Trainable params 108,036,096 (1.15% of total)

Results

Metric Value
Final train loss 0.383
Final train mean token accuracy 0.886
Best eval loss 0.846 (step 1000)
Final eval loss 0.856
Final eval mean token accuracy ~0.782

Train loss dropped from ~1.4 β†’ 0.38 over 3 epochs. Eval loss stayed stable with no significant overfitting. The oscillations in training curves are expected due to group_by_length batching β€” short and long sequences alternate, causing variance in per-batch loss.


Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

base = AutoModelForCausalLM.from_pretrained(
    "google/gemma-2-9b-it",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
model = PeftModel.from_pretrained(base, "kunjcr2/gemma-2-9b-proof-lora")
tok = AutoTokenizer.from_pretrained("kunjcr2/gemma-2-9b-proof-lora")

prompt = (
    "<bos><start_of_turn>user\n"
    "You are a formal mathematics assistant. Prove the following:\n\n"
    "The square root of 2 is irrational."
    "<end_of_turn>\n<start_of_turn>model\n"
)

ids = tok(prompt, return_tensors="pt").to("cuda")
out = model.generate(**ids, max_new_tokens=512, temperature=0.3, do_sample=True)
print(tok.decode(out[0], skip_special_tokens=True))

Limitations

  • This is an SFT-only run β€” the model learned proof structure and style, not verified logical correctness. A wrong proof written in a rigorous format will still look convincing.
  • Dataset is general mathematical text, not curated theorem-proof pairs, so quality varies.
  • For verified correctness, a GRPO stage with a structural + mathematical reward signal would be the next step.

Citation

If you use this adapter, please cite the base model and dataset:

@misc{gemma2_proof_lora,
  author = {Kunj Shah},
  title = {gemma-2-9b-proof-lora},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/kunjcr2/gemma-2-9b-proof-lora}
}
Downloads last month
104
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for kunjcr2/gemma-2-9b-proof-lora

Adapter
(408)
this model