CogniTune-Qwen2.5-3B

A domain-specialized AI/ML tutor model fine-tuned from Qwen2.5-3B-Instruct using LoRA on Apple Silicon (M5 Pro, 24GB unified memory) via MLX.

What It Does

Standard LLMs respond to AI/ML questions like encyclopedias β€” dense, exhaustive, impersonal. CogniTune responds like a tutor β€” leading with a concrete analogy, explicitly correcting common misconceptions, and compressing the concept into a memorable one-liner.

Base model response to "What is overfitting?"

Overfitting is a phenomenon in machine learning where a model learns the detail and noise in the training data to an extent that it negatively impacts performance on new, unseen data. Here are some key points... (numbered Wikipedia-style bullets follow)

CogniTune response to "What is overfitting?"

Overfitting occurs when a model learns the training data too well β€” including its noise and random fluctuations β€” and fails to generalize to new, unseen data. Think of it like a student who memorizes every practice exam answer verbatim instead of understanding the underlying concepts...

Usage

pip install mlx-lm

mlx_lm.generate \
  --model Pickamon/CogniTune-Qwen2.5-3B \
  --prompt "What is gradient boosting?" \
  --max-tokens 400

Training Details

Parameter Value
Base model Qwen/Qwen2.5-3B-Instruct
Method LoRA
Framework MLX (Apple Silicon native)
Hardware Apple M5 Pro, 24GB unified memory
LoRA layers 8
LoRA rank 8
Learning rate 5e-5
Batch size 4
Optimal checkpoint 100 iterations
Dataset size ~460 examples
Training time ~4 minutes per 100 iterations
Peak memory ~10GB

Dataset

~460 hand-crafted AI/ML Q&A pairs covering topics including:

  • Neural network fundamentals (backprop, activations, normalization)
  • Training dynamics (optimizers, learning rate scheduling, regularization)
  • Architectures (transformers, CNNs, RNNs, LSTMs)
  • Modern LLM concepts (attention, LoRA, RLHF, RAG)
  • Classical ML (SVMs, decision trees, ensemble methods)
  • Evaluation metrics and experimental methodology

Two format styles were used deliberately:

  • Structured 4-part format: explanation β†’ analogy β†’ misconception β†’ one-liner
  • Varied formats: numbered steps, comparison tables, debug-style walkthroughs, decision guides, code-grounded explanations

Key Experimental Findings

  • Optimal early stopping at ~100 iterations regardless of dataset size
  • Varied format data reduced best validation loss by 11% over uniform templates
  • Style transfer confirmed β€” fine-tuned model leads with analogies vs base model's encyclopedic bullets
  • Factual accuracy is orthogonal to style fine-tuning β€” the adapter shapes presentation without correcting base model knowledge
  • Out-of-distribution topics produce shorter responses with higher hallucination risk

Limitations

  • Hallucinations persist on topics requiring precise factual recall β€” this model teaches style, not facts
  • Out-of-distribution topics (outside AI/ML domain) revert toward base model behavior
  • Responses on unseen topics are shorter and less structured than responses on training-adjacent topics
  • Not suitable for high-stakes factual lookup β€” use RAG for that

Evaluation

Qualitative comparison against base Qwen2.5-3B-Instruct on identical prompts:

Prompt Base Model CogniTune
"What is overfitting?" Numbered bullets, encyclopedic, cut off at token limit Analogy-led, complete, self-contained
"What is the vanishing gradient problem?" Textbook definition Mechanism explanation + one-liner
"What is federated learning?" Dense paragraph Analogy + concise explanation

Environmental Impact

  • Hardware: Apple M5 Pro (no discrete GPU, Neural Engine + GPU cores)
  • Training time: ~45 minutes total across all experiments
  • Cloud provider: None β€” fully local training
  • Carbon footprint: Minimal (Apple Silicon is significantly more energy efficient than GPU cluster training)

Author

Irtiza Saleem β€” MSc Artificial Intelligence and Computer Science, University of Birmingham Dubai

GitHub: Pickamon
LinkedIn: irtiza-saleem

Downloads last month
142
Safetensors
Model size
3B params
Tensor type
BF16
Β·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Pickamon/CogniTune-Qwen2.5-3B

Base model

Qwen/Qwen2.5-3B
Adapter
(1003)
this model
Adapters
2 models