File size: 2,872 Bytes

51c8774
af77d5d
 
 
51c8774
af77d5d
 
 
 
 
 
 
51c8774
 
af77d5d
51c8774
781eade
51c8774
af77d5d
51c8774
af77d5d
 
 
 
 
 
 
51c8774
af77d5d
51c8774
af77d5d
 
 
 
 
51c8774
af77d5d
 
 
51c8774
af77d5d
51c8774
af77d5d
51c8774
af77d5d
 
 
 
51c8774
af77d5d
 
51c8774
af77d5d
 
 
 
 
 
51c8774
af77d5d
 
 
 
51c8774
af77d5d
 
 
 
 
 
51c8774
af77d5d

---
language:
- en
license: apache-2.0
base_model: google/gemma-2b-it
tags:
- peft
- qlora
- tutor
- python
- sql
- dsa
---

# AI Programming Tutor (Gemma 2B - Fine-Tuned)

This model is a fine-tuned version of `google/gemma-2b-it` designed to act as an expert Agentic programming tutor. It was developed as part of an AI/ML Engineer assessment for Purple Merit Technologies. 

Rather than simply giving users the answer, this model is trained to teach concepts using a strict pedagogical structure.

## 🧠 Pedagogical Structure
The model enforces the following flow for every response:
1. **Goal**: States what the student will learn.
2. **Concept**: Explains the intuition behind the topic.
3. **Worked Example**: Provides step-by-step code with comments.
4. **Common Mistakes**: Highlights typical errors students make.
5. **Checkpoint**: Asks a guiding question to verify understanding.

It is also trained to gracefully redirect out-of-scope requests (e.g., calculus or diet advice) back to programming topics.

## 🛠️ Training Details
* **Base Model:** `google/gemma-2b-it`
* **Technique:** QLoRA (4-bit NF4 quantization)
* **Dataset:** A curated mix of synthetic tutoring conversations and a filtered Python/SQL subset of CodeAlpaca-20k.
* **Infrastructure:** Trained on a single NVIDIA T4 GPU.

## 📊 Evaluation
The model was evaluated on a held-out set of 30 prompts (25 in-domain, 5 out-of-domain). 
* **Perplexity Improvement:** The fine-tuned model achieved a perplexity of **2.09**, a **95.1% improvement** over the base model's 43.05 on the same tutoring dataset.

## 💻 How to Use (Inference)

You can load this model using `transformers` and `peft`. *Note: Ensure you load the model in `float16` if running on a T4 GPU to prevent memory access errors.*

```python
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch

BASE = "google/gemma-2b-it"
ADAPTER = "Imrozkhan007/programming-tutor-gemma-2b"

# Use float16 for T4 compatibility
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True, 
    bnb_4bit_quant_type='nf4',
    bnb_4bit_compute_dtype=torch.float16
)

tokenizer = AutoTokenizer.from_pretrained(BASE)
base_model = AutoModelForCausalLM.from_pretrained(BASE, quantization_config=bnb_config, device_map={"": 0})
model = PeftModel.from_pretrained(base_model, ADAPTER)
model.eval()

# Example Prompt
prompt = "Explain binary search step by step"
messages = [
    {"role": "user", "content": f"You are an expert programming tutor...\n\nStudent Question: {prompt}"}
]
formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = tokenizer(formatted_prompt, return_tensors='pt').to(model.device)
out = model.generate(**inputs, max_new_tokens=512, temperature=0.3)
print(tokenizer.decode(out[0], skip_special_tokens=True))