Safetensors
English
mathematics
reasoning
curriculum-learning
mistral
lora
unsloth

MateMistral-7B-Base 🧮

MateMistral-7B-Base is a mathematics-focused language model built on top of Mistral-7B-Instruct-v0.3, trained using a curriculum learning strategy that progressively moves from logical reasoning to complex olympiad-level mathematics.

This is the base checkpoint before GRPO reinforcement learning. The final reinforcement-tuned version will be released as omurberaisik/MateMistral-7B.


🎯 Training Strategy

Unlike standard fine-tuning approaches that mix all data randomly, MateMistral-7B-Base uses a strict curriculum order — inspired by how humans learn mathematics: Stage 1: Logic & Reasoning → 6,000 samples (20%)

Stage 2: Code + Mathematics → 4,500 samples (15%)

Stage 3: Hard Olympiad Math → 19,500 samples (65%)

─────────────────────────────────────────────────────

Total → 30,000 samples

This ordering is not random. The model first builds a strong reasoning foundation, then learns mathematical code patterns, and finally tackles olympiad-level problems — at which point the loss drops sharply, indicating genuine mathematical understanding rather than pattern memorization.

📉 The loss curve showed a dramatic drop at step ~385, exactly when NuminaMath olympiad problems began — confirming the curriculum strategy worked.


📊 Training Details

Parameter Value
Base Model Mistral-7B-Instruct-v0.3 (4-bit)
Method LoRA + Curriculum SFT
LoRA Rank 16
LoRA Alpha 16
Max Sequence Length 4096
Effective Batch Size 16
Training Steps 600
Learning Rate 2e-4
Optimizer AdamW 8-bit
Hardware Kaggle T4 GPU
Training Time ~6 hours

📦 Datasets Used

Dataset Samples Purpose
Open-Orca/SlimOrca-Dedup 6,000 Logic & reasoning foundation
MathLLMs/MathCodeInstruct 4,500 Mathematical code understanding
AI-MO/NuminaMath-CoT 19,500 Olympiad-level mathematics

🚀 Usage

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name     = "omurberaisik/MateMistral-7B-base",
    max_seq_length = 4096,
    load_in_4bit   = True,
)
FastLanguageModel.for_inference(model)

messages = [{"role": "user", "content": "Find all integer solutions to x² + y² = z²."}]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to("cuda")

outputs = model.generate(
    input_ids      = inputs,
    max_new_tokens = 512,
    temperature    = 0.7,
    do_sample      = True,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

🗺️ Roadmap

  • Curriculum Pre-training (this checkpoint)
  • GRPO Reinforcement Learning → omurberaisik/MateMistral-7B
  • Benchmark evaluation (MATH500, AIME 2024)
  • GGUF quantized versions

⚠️ Limitations

  • This is the base checkpoint, not the final model
  • Best results after GRPO fine-tuning (omurberaisik/MateMistral-7B)
  • Optimized for English mathematical reasoning

👤 Author

Trained by @omurberaisik using curriculum learning on a single Kaggle T4 GPU.

"The right order of learning matters more than the amount of data."

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for omurberaisik/MateMistral-7B-base

Adapter
(828)
this model

Datasets used to train omurberaisik/MateMistral-7B-base