CogniTune-Qwen2.5-3B
A domain-specialized AI/ML tutor model fine-tuned from Qwen2.5-3B-Instruct using LoRA on Apple Silicon (M5 Pro, 24GB unified memory) via MLX.
What It Does
Standard LLMs respond to AI/ML questions like encyclopedias β dense, exhaustive, impersonal. CogniTune responds like a tutor β leading with a concrete analogy, explicitly correcting common misconceptions, and compressing the concept into a memorable one-liner.
Base model response to "What is overfitting?"
Overfitting is a phenomenon in machine learning where a model learns the detail and noise in the training data to an extent that it negatively impacts performance on new, unseen data. Here are some key points... (numbered Wikipedia-style bullets follow)
CogniTune response to "What is overfitting?"
Overfitting occurs when a model learns the training data too well β including its noise and random fluctuations β and fails to generalize to new, unseen data. Think of it like a student who memorizes every practice exam answer verbatim instead of understanding the underlying concepts...
Usage
pip install mlx-lm
mlx_lm.generate \
--model Pickamon/CogniTune-Qwen2.5-3B \
--prompt "What is gradient boosting?" \
--max-tokens 400
Training Details
| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen2.5-3B-Instruct |
| Method | LoRA |
| Framework | MLX (Apple Silicon native) |
| Hardware | Apple M5 Pro, 24GB unified memory |
| LoRA layers | 8 |
| LoRA rank | 8 |
| Learning rate | 5e-5 |
| Batch size | 4 |
| Optimal checkpoint | 100 iterations |
| Dataset size | ~460 examples |
| Training time | ~4 minutes per 100 iterations |
| Peak memory | ~10GB |
Dataset
~460 hand-crafted AI/ML Q&A pairs covering topics including:
- Neural network fundamentals (backprop, activations, normalization)
- Training dynamics (optimizers, learning rate scheduling, regularization)
- Architectures (transformers, CNNs, RNNs, LSTMs)
- Modern LLM concepts (attention, LoRA, RLHF, RAG)
- Classical ML (SVMs, decision trees, ensemble methods)
- Evaluation metrics and experimental methodology
Two format styles were used deliberately:
- Structured 4-part format: explanation β analogy β misconception β one-liner
- Varied formats: numbered steps, comparison tables, debug-style walkthroughs, decision guides, code-grounded explanations
Key Experimental Findings
- Optimal early stopping at ~100 iterations regardless of dataset size
- Varied format data reduced best validation loss by 11% over uniform templates
- Style transfer confirmed β fine-tuned model leads with analogies vs base model's encyclopedic bullets
- Factual accuracy is orthogonal to style fine-tuning β the adapter shapes presentation without correcting base model knowledge
- Out-of-distribution topics produce shorter responses with higher hallucination risk
Limitations
- Hallucinations persist on topics requiring precise factual recall β this model teaches style, not facts
- Out-of-distribution topics (outside AI/ML domain) revert toward base model behavior
- Responses on unseen topics are shorter and less structured than responses on training-adjacent topics
- Not suitable for high-stakes factual lookup β use RAG for that
Evaluation
Qualitative comparison against base Qwen2.5-3B-Instruct on identical prompts:
| Prompt | Base Model | CogniTune |
|---|---|---|
| "What is overfitting?" | Numbered bullets, encyclopedic, cut off at token limit | Analogy-led, complete, self-contained |
| "What is the vanishing gradient problem?" | Textbook definition | Mechanism explanation + one-liner |
| "What is federated learning?" | Dense paragraph | Analogy + concise explanation |
Environmental Impact
- Hardware: Apple M5 Pro (no discrete GPU, Neural Engine + GPU cores)
- Training time: ~45 minutes total across all experiments
- Cloud provider: None β fully local training
- Carbon footprint: Minimal (Apple Silicon is significantly more energy efficient than GPU cluster training)
Author
Irtiza Saleem β MSc Artificial Intelligence and Computer Science, University of Birmingham Dubai
GitHub: Pickamon
LinkedIn: irtiza-saleem
- Downloads last month
- 142
Quantized