MateMistral-7B-Base 🧮

MateMistral-7B-Base is a mathematics-focused language model built on top of Mistral-7B-Instruct-v0.3, trained using a curriculum learning strategy that progressively moves from logical reasoning to complex olympiad-level mathematics.

This is the base checkpoint before GRPO reinforcement learning. The final reinforcement-tuned version will be released as omurberaisik/MateMistral-7B.

🎯 Training Strategy

Unlike standard fine-tuning approaches that mix all data randomly, MateMistral-7B-Base uses a strict curriculum order — inspired by how humans learn mathematics: Stage 1: Logic & Reasoning → 6,000 samples (20%)

Stage 2: Code + Mathematics → 4,500 samples (15%)

Stage 3: Hard Olympiad Math → 19,500 samples (65%)

─────────────────────────────────────────────────────

Total → 30,000 samples

This ordering is not random. The model first builds a strong reasoning foundation, then learns mathematical code patterns, and finally tackles olympiad-level problems — at which point the loss drops sharply, indicating genuine mathematical understanding rather than pattern memorization.

📉 The loss curve showed a dramatic drop at step ~385, exactly when NuminaMath olympiad problems began — confirming the curriculum strategy worked.

📊 Training Details

Parameter	Value
Base Model	Mistral-7B-Instruct-v0.3 (4-bit)
Method	LoRA + Curriculum SFT
LoRA Rank	16
LoRA Alpha	16
Max Sequence Length	4096
Effective Batch Size	16
Training Steps	600
Learning Rate	2e-4
Optimizer	AdamW 8-bit
Hardware	Kaggle T4 GPU
Training Time	~6 hours

📦 Datasets Used

Dataset	Samples	Purpose
Open-Orca/SlimOrca-Dedup	6,000	Logic & reasoning foundation
MathLLMs/MathCodeInstruct	4,500	Mathematical code understanding
AI-MO/NuminaMath-CoT	19,500	Olympiad-level mathematics

🚀 Usage

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name     = "omurberaisik/MateMistral-7B-base",
    max_seq_length = 4096,
    load_in_4bit   = True,
)
FastLanguageModel.for_inference(model)

messages = [{"role": "user", "content": "Find all integer solutions to x² + y² = z²."}]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to("cuda")

outputs = model.generate(
    input_ids      = inputs,
    max_new_tokens = 512,
    temperature    = 0.7,
    do_sample      = True,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

🗺️ Roadmap

Curriculum Pre-training (this checkpoint)
GRPO Reinforcement Learning → omurberaisik/MateMistral-7B
Benchmark evaluation (MATH500, AIME 2024)
GGUF quantized versions

⚠️ Limitations

This is the base checkpoint, not the final model
Best results after GRPO fine-tuning (omurberaisik/MateMistral-7B)
Optimized for English mathematical reasoning

👤 Author

Trained by @omurberaisik using curriculum learning on a single Kaggle T4 GPU.

"The right order of learning matters more than the amount of data."

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for omurberaisik/MateMistral-7B-base

Base model

mistralai/Mistral-7B-v0.3

Finetuned

mistralai/Mistral-7B-Instruct-v0.3

Adapter

(828)

this model

omurberaisik
/

MateMistral-7B-base