Spaces:
Sleeping
Sleeping
Upload 7 files
Browse files- src/DEPLOYMENT.md +168 -0
- src/README.md +185 -0
- src/app.py +377 -0
- src/config.py +126 -0
- src/example_usage.py +186 -0
- src/fine.py +945 -0
- src/requirements.txt +37 -0
src/DEPLOYMENT.md
ADDED
|
@@ -0,0 +1,168 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# 🚀 Deployment Guide: Hugging Face Spaces
|
| 2 |
+
|
| 3 |
+
## Quick Start (5 minutes)
|
| 4 |
+
|
| 5 |
+
### Step 1: Prepare Your Repository
|
| 6 |
+
1. **Create a GitHub repository** with your project files
|
| 7 |
+
2. **Upload all files** from this directory to your GitHub repo
|
| 8 |
+
3. **Make sure you have**:
|
| 9 |
+
- `app.py` (main Streamlit app)
|
| 10 |
+
- `fine.py` (AI tutor implementation)
|
| 11 |
+
- `requirements.txt` (dependencies)
|
| 12 |
+
- `README.md` (documentation)
|
| 13 |
+
|
| 14 |
+
### Step 2: Create Hugging Face Space
|
| 15 |
+
1. **Go to** [huggingface.co/spaces](https://huggingface.co/spaces)
|
| 16 |
+
2. **Click** "Create new Space"
|
| 17 |
+
3. **Fill in the details**:
|
| 18 |
+
- **Owner**: Your HF username
|
| 19 |
+
- **Space name**: `ai-programming-tutor`
|
| 20 |
+
- **License**: Choose appropriate license
|
| 21 |
+
- **SDK**: Select **Streamlit**
|
| 22 |
+
- **Python version**: 3.10
|
| 23 |
+
4. **Click** "Create Space"
|
| 24 |
+
|
| 25 |
+
### Step 3: Connect Your Repository
|
| 26 |
+
1. **In your Space settings**, go to "Repository" tab
|
| 27 |
+
2. **Select** "GitHub repository"
|
| 28 |
+
3. **Choose** your GitHub repository
|
| 29 |
+
4. **Set the main file** to `app.py`
|
| 30 |
+
5. **Click** "Save"
|
| 31 |
+
|
| 32 |
+
### Step 4: Upload Your Fine-tuned Model
|
| 33 |
+
1. **In your Space**, go to "Files" tab
|
| 34 |
+
2. **Create a folder** called `model`
|
| 35 |
+
3. **Upload your fine-tuned model files**:
|
| 36 |
+
- `model-00001-of-00006.safetensors`
|
| 37 |
+
- `model-00002-of-00006.safetensors`
|
| 38 |
+
- `model-00003-of-00006.safetensors`
|
| 39 |
+
- `model-00004-of-00006.safetensors`
|
| 40 |
+
- `model-00005-of-00006.safetensors`
|
| 41 |
+
- `model-00006-of-00006.safetensors`
|
| 42 |
+
- `config.json`
|
| 43 |
+
- `tokenizer.json`
|
| 44 |
+
- `tokenizer.model`
|
| 45 |
+
- `tokenizer_config.json`
|
| 46 |
+
- `special_tokens_map.json`
|
| 47 |
+
- `generation_config.json`
|
| 48 |
+
|
| 49 |
+
### Step 5: Update Model Path
|
| 50 |
+
1. **Edit** `app.py` in your Space
|
| 51 |
+
2. **Change the model path** to:
|
| 52 |
+
```python
|
| 53 |
+
model_path = "./model" # Path to uploaded model
|
| 54 |
+
```
|
| 55 |
+
3. **Save** the changes
|
| 56 |
+
|
| 57 |
+
### Step 6: Deploy
|
| 58 |
+
1. **Your Space will automatically build** and deploy
|
| 59 |
+
2. **Wait for the build to complete** (5-10 minutes)
|
| 60 |
+
3. **Your app will be live** at: `https://huggingface.co/spaces/YOUR_USERNAME/ai-programming-tutor`
|
| 61 |
+
|
| 62 |
+
## 🎯 Advanced Configuration
|
| 63 |
+
|
| 64 |
+
### Hardware Settings
|
| 65 |
+
- **CPU**: Default (sufficient for inference)
|
| 66 |
+
- **GPU**: T4 (recommended for faster inference)
|
| 67 |
+
- **Memory**: 16GB+ (required for 7B model)
|
| 68 |
+
|
| 69 |
+
### Environment Variables
|
| 70 |
+
Add these in your Space settings:
|
| 71 |
+
```
|
| 72 |
+
TOKENIZERS_PARALLELISM=false
|
| 73 |
+
DATASETS_DISABLE_MULTIPROCESSING=1
|
| 74 |
+
```
|
| 75 |
+
|
| 76 |
+
### Custom Domain (Optional)
|
| 77 |
+
1. **In Space settings**, go to "Settings" tab
|
| 78 |
+
2. **Enable** "Custom domain"
|
| 79 |
+
3. **Add your domain** (e.g., `tutor.yourdomain.com`)
|
| 80 |
+
|
| 81 |
+
## 🔧 Troubleshooting
|
| 82 |
+
|
| 83 |
+
### Common Issues
|
| 84 |
+
|
| 85 |
+
**Issue**: Model not loading
|
| 86 |
+
- **Solution**: Check model path and file structure
|
| 87 |
+
- **Debug**: Look at Space logs in "Settings" → "Logs"
|
| 88 |
+
|
| 89 |
+
**Issue**: Out of memory
|
| 90 |
+
- **Solution**: Upgrade to GPU hardware
|
| 91 |
+
- **Alternative**: Use demo mode
|
| 92 |
+
|
| 93 |
+
**Issue**: Build fails
|
| 94 |
+
- **Solution**: Check `requirements.txt` for missing dependencies
|
| 95 |
+
- **Debug**: Review build logs
|
| 96 |
+
|
| 97 |
+
### Performance Optimization
|
| 98 |
+
|
| 99 |
+
1. **Enable GPU** in Space settings
|
| 100 |
+
2. **Use model quantization** for faster inference
|
| 101 |
+
3. **Implement caching** for repeated requests
|
| 102 |
+
4. **Add rate limiting** to prevent abuse
|
| 103 |
+
|
| 104 |
+
## 📊 Monitoring
|
| 105 |
+
|
| 106 |
+
### Usage Analytics
|
| 107 |
+
- **View usage** in Space settings
|
| 108 |
+
- **Monitor performance** with built-in metrics
|
| 109 |
+
- **Track user engagement** through logs
|
| 110 |
+
|
| 111 |
+
### Cost Management
|
| 112 |
+
- **Free tier**: 16 hours/month GPU time
|
| 113 |
+
- **Pro tier**: $9/month for unlimited GPU
|
| 114 |
+
- **Enterprise**: Custom pricing
|
| 115 |
+
|
| 116 |
+
## 🌐 Sharing Your App
|
| 117 |
+
|
| 118 |
+
### Public Access
|
| 119 |
+
1. **Set Space to public** in settings
|
| 120 |
+
2. **Share the URL** with users
|
| 121 |
+
3. **Add to HF Spaces showcase**
|
| 122 |
+
|
| 123 |
+
### Embedding
|
| 124 |
+
```html
|
| 125 |
+
<iframe
|
| 126 |
+
src="https://huggingface.co/spaces/YOUR_USERNAME/ai-programming-tutor"
|
| 127 |
+
width="100%"
|
| 128 |
+
height="800px"
|
| 129 |
+
frameborder="0"
|
| 130 |
+
></iframe>
|
| 131 |
+
```
|
| 132 |
+
|
| 133 |
+
## 🔒 Security Considerations
|
| 134 |
+
|
| 135 |
+
1. **Input validation** for code submissions
|
| 136 |
+
2. **Rate limiting** to prevent abuse
|
| 137 |
+
3. **Content filtering** for inappropriate code
|
| 138 |
+
4. **User authentication** (optional)
|
| 139 |
+
|
| 140 |
+
## 📈 Scaling
|
| 141 |
+
|
| 142 |
+
### For High Traffic
|
| 143 |
+
1. **Upgrade to Pro tier** for unlimited GPU
|
| 144 |
+
2. **Implement caching** with Redis
|
| 145 |
+
3. **Use load balancing** for multiple instances
|
| 146 |
+
4. **Monitor performance** and optimize
|
| 147 |
+
|
| 148 |
+
### For Production Use
|
| 149 |
+
1. **Add user authentication**
|
| 150 |
+
2. **Implement logging** and analytics
|
| 151 |
+
3. **Set up monitoring** and alerts
|
| 152 |
+
4. **Create backup** and recovery procedures
|
| 153 |
+
|
| 154 |
+
## 🎉 Success!
|
| 155 |
+
|
| 156 |
+
Your AI Programming Tutor is now live and accessible to students worldwide!
|
| 157 |
+
|
| 158 |
+
**Next steps**:
|
| 159 |
+
1. **Test thoroughly** with different code examples
|
| 160 |
+
2. **Gather user feedback** and iterate
|
| 161 |
+
3. **Share with your target audience**
|
| 162 |
+
4. **Monitor usage** and improve based on data
|
| 163 |
+
|
| 164 |
+
## 📞 Support
|
| 165 |
+
|
| 166 |
+
- **Hugging Face Docs**: [docs.huggingface.co](https://docs.huggingface.co)
|
| 167 |
+
- **Spaces Documentation**: [huggingface.co/docs/hub/spaces](https://huggingface.co/docs/hub/spaces)
|
| 168 |
+
- **Community Forum**: [discuss.huggingface.co](https://discuss.huggingface.co)
|
src/README.md
ADDED
|
@@ -0,0 +1,185 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# 🎓 Generative AI for Programming Education
|
| 2 |
+
|
| 3 |
+
## 🚀 Live Demo
|
| 4 |
+
**Hugging Face Spaces**: [Coming Soon - Deploy using DEPLOYMENT.md guide]
|
| 5 |
+
|
| 6 |
+
## 📋 Problem Statement
|
| 7 |
+
Current programming education struggles with high dropout rates, inefficient feedback loops, and a lack of personalized learning—problems exacerbated by limited instructor bandwidth. While Generative AI (e.g., Copilot, ChatGPT) can help, most tools prioritize productivity over learning, offering code solutions without explanations or tailored guidance. This risks student over-reliance without deeper comprehension.
|
| 8 |
+
|
| 9 |
+
## 🎯 Solution
|
| 10 |
+
To address this gap, we fine-tuned **CodeLlama-7B** to provide structured, educational code feedback—not just correct answers. Our model analyzes student code and delivers:
|
| 11 |
+
|
| 12 |
+
- **Instant, actionable reviews** (e.g., "This loop can be optimized from O(n²) to O(n) using a hashmap")
|
| 13 |
+
- **Beginner-friendly explanations** (e.g., "In Python, list.append() modifies the list in-place but returns None—that's why your print() shows None")
|
| 14 |
+
- **Personalized adaptation** (e.g., adjusting feedback depth based on inferred skill level)
|
| 15 |
+
|
| 16 |
+
Unlike generic AI tools, our system is explicitly designed for education, balancing correctness, pedagogy, and ethical safeguards against over-reliance.
|
| 17 |
+
|
| 18 |
+
## ✨ Features
|
| 19 |
+
|
| 20 |
+
### 🧠 **Fine-tuned CodeLlama-7B Model**
|
| 21 |
+
- Trained on **code review** and **code feedback** datasets
|
| 22 |
+
- **7B parameters** for comprehensive understanding
|
| 23 |
+
- **Educational focus** rather than productivity optimization
|
| 24 |
+
|
| 25 |
+
### 📊 **Progressive Learning Interface**
|
| 26 |
+
- **5-stage educational process**:
|
| 27 |
+
1. **Code Analysis** - Strengths, weaknesses, issues
|
| 28 |
+
2. **Improvement Guide** - Step-by-step instructions
|
| 29 |
+
3. **Learning Points** - Key concepts and objectives
|
| 30 |
+
4. **Comprehension Quiz** - Test understanding
|
| 31 |
+
5. **Code Fix** - Improved solution (only after learning)
|
| 32 |
+
|
| 33 |
+
### 🎓 **Educational Features**
|
| 34 |
+
- **Student Level Adaptation** (Beginner/Intermediate/Advanced)
|
| 35 |
+
- **Comprehension Questions** generated by the model
|
| 36 |
+
- **Learning Objectives** for each feedback
|
| 37 |
+
- **Step-by-step improvement guides**
|
| 38 |
+
- **Algorithm complexity explanations**
|
| 39 |
+
|
| 40 |
+
### 🛡️ **Ethical Safeguards**
|
| 41 |
+
- **Progressive learning flow** prevents solution jumping
|
| 42 |
+
- **Comprehension testing** before showing fixes
|
| 43 |
+
- **Educational explanations** rather than quick answers
|
| 44 |
+
- **Best practices promotion**
|
| 45 |
+
|
| 46 |
+
## 🚀 **Hugging Face Spaces Deployment**
|
| 47 |
+
|
| 48 |
+
### **Hardware Specifications**
|
| 49 |
+
- **CPU**: 2 vCPU (virtual CPU cores)
|
| 50 |
+
- **RAM**: 16 GB
|
| 51 |
+
- **Plan**: FREE tier
|
| 52 |
+
- **Storage**: Sufficient for model and application
|
| 53 |
+
|
| 54 |
+
### **Optimization Features**
|
| 55 |
+
- ✅ **16GB RAM optimization** for fine-tuned model
|
| 56 |
+
- ✅ **CPU-only inference** (no GPU required)
|
| 57 |
+
- ✅ **Memory management** with gradient checkpointing
|
| 58 |
+
- ✅ **Demo mode** for immediate testing
|
| 59 |
+
- ✅ **Progressive loading** with fallback options
|
| 60 |
+
|
| 61 |
+
### **Performance Expectations**
|
| 62 |
+
- **Demo Mode**: Instant response
|
| 63 |
+
- **Fine-tuned Model**: 5-10 minutes initial loading
|
| 64 |
+
- **Memory Usage**: Optimized for 16GB constraint
|
| 65 |
+
- **Concurrent Users**: Limited by CPU cores
|
| 66 |
+
|
| 67 |
+
## 🛠️ Installation & Setup
|
| 68 |
+
|
| 69 |
+
### **Local Development**
|
| 70 |
+
```bash
|
| 71 |
+
# Clone the repository
|
| 72 |
+
git clone https://github.com/TomoriFarouk/GenAI-For-Programming-Language.git
|
| 73 |
+
cd GenAI-For-Programming-Language
|
| 74 |
+
|
| 75 |
+
# Install dependencies
|
| 76 |
+
pip install -r requirements.txt
|
| 77 |
+
|
| 78 |
+
# Run the application
|
| 79 |
+
streamlit run app.py
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
### **Hugging Face Spaces Deployment**
|
| 83 |
+
Follow the detailed guide in `DEPLOYMENT.md` for step-by-step instructions.
|
| 84 |
+
|
| 85 |
+
## 📁 Project Structure
|
| 86 |
+
|
| 87 |
+
```
|
| 88 |
+
GenAI-For-Programming-Language/
|
| 89 |
+
├── app.py # Main Streamlit interface (HF Spaces optimized)
|
| 90 |
+
├── fine.py # Fine-tuned model integration
|
| 91 |
+
├── config.py # Configuration settings
|
| 92 |
+
├── requirements.txt # Dependencies
|
| 93 |
+
├── README.md # This file
|
| 94 |
+
├── DEPLOYMENT.md # HF Spaces deployment guide
|
| 95 |
+
├── .gitignore # Excludes model files
|
| 96 |
+
├── .gitattributes # File type configuration
|
| 97 |
+
└── example_usage.py # Usage examples
|
| 98 |
+
```
|
| 99 |
+
|
| 100 |
+
## 🧠 Model Architecture
|
| 101 |
+
|
| 102 |
+
### **Base Model**
|
| 103 |
+
- **CodeLlama-7B-Instruct-hf**
|
| 104 |
+
- **7 billion parameters**
|
| 105 |
+
- **Code-specific training**
|
| 106 |
+
|
| 107 |
+
### **Fine-tuning Datasets**
|
| 108 |
+
1. **Code Review Dataset**: Structured feedback on code quality
|
| 109 |
+
2. **Code Feedback Dataset**: Educational explanations and improvements
|
| 110 |
+
|
| 111 |
+
### **Training Process**
|
| 112 |
+
- **LoRA fine-tuning** for efficiency
|
| 113 |
+
- **Educational prompt engineering**
|
| 114 |
+
- **Multi-stage feedback generation**
|
| 115 |
+
|
| 116 |
+
## 🎯 Usage Examples
|
| 117 |
+
|
| 118 |
+
### **Input Code**
|
| 119 |
+
```python
|
| 120 |
+
def find_duplicates(numbers):
|
| 121 |
+
x = []
|
| 122 |
+
for i in range(len(numbers)):
|
| 123 |
+
for j in range(i+1, len(numbers)):
|
| 124 |
+
if numbers[i] == numbers[j]:
|
| 125 |
+
x.append(numbers[i])
|
| 126 |
+
return x
|
| 127 |
+
```
|
| 128 |
+
|
| 129 |
+
### **Generated Feedback**
|
| 130 |
+
1. **Analysis**: Identifies O(n²) complexity, poor variable naming
|
| 131 |
+
2. **Improvement Guide**: Step-by-step optimization instructions
|
| 132 |
+
3. **Learning Points**: Algorithm complexity, naming conventions
|
| 133 |
+
4. **Quiz**: "What is the time complexity and how to improve it?"
|
| 134 |
+
5. **Code Fix**: Optimized O(n) solution with better naming
|
| 135 |
+
|
| 136 |
+
## 🔧 Configuration
|
| 137 |
+
|
| 138 |
+
### **Model Settings**
|
| 139 |
+
- **Path**: `./model` (for HF Spaces)
|
| 140 |
+
- **Device**: CPU-optimized for 16GB RAM
|
| 141 |
+
- **Memory**: Gradient checkpointing enabled
|
| 142 |
+
|
| 143 |
+
### **Educational Settings**
|
| 144 |
+
- **Student Levels**: Beginner, Intermediate, Advanced
|
| 145 |
+
- **Feedback Types**: Syntax, Logic, Optimization, Style
|
| 146 |
+
- **Learning Objectives**: Comprehensive programming concepts
|
| 147 |
+
|
| 148 |
+
## 🚀 Performance
|
| 149 |
+
|
| 150 |
+
### **Local Environment**
|
| 151 |
+
- **GPU**: Recommended for faster inference
|
| 152 |
+
- **RAM**: 16GB+ recommended
|
| 153 |
+
- **Storage**: 30GB+ for model files
|
| 154 |
+
|
| 155 |
+
### **Hugging Face Spaces**
|
| 156 |
+
- **CPU**: 2 vCPU (sufficient for inference)
|
| 157 |
+
- **RAM**: 16GB (optimized for this constraint)
|
| 158 |
+
- **Loading Time**: 5-10 minutes for fine-tuned model
|
| 159 |
+
- **Demo Mode**: Instant response
|
| 160 |
+
|
| 161 |
+
## 🤝 Contributing
|
| 162 |
+
|
| 163 |
+
1. Fork the repository
|
| 164 |
+
2. Create a feature branch
|
| 165 |
+
3. Make your changes
|
| 166 |
+
4. Test thoroughly
|
| 167 |
+
5. Submit a pull request
|
| 168 |
+
|
| 169 |
+
## 📄 License
|
| 170 |
+
|
| 171 |
+
This project is licensed under the MIT License - see the LICENSE file for details.
|
| 172 |
+
|
| 173 |
+
## 🙏 Acknowledgments
|
| 174 |
+
|
| 175 |
+
- **CodeLlama team** for the base model
|
| 176 |
+
- **Hugging Face** for the Spaces platform
|
| 177 |
+
- **Streamlit** for the web interface framework
|
| 178 |
+
|
| 179 |
+
## 📞 Contact
|
| 180 |
+
|
| 181 |
+
For questions or support, please open an issue on GitHub.
|
| 182 |
+
|
| 183 |
+
---
|
| 184 |
+
|
| 185 |
+
**🎓 Empowering programming education through AI-driven, structured learning experiences.**
|
src/app.py
ADDED
|
@@ -0,0 +1,377 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
AI Programming Tutor - Hugging Face Spaces Deployment
|
| 3 |
+
Comprehensive Educational Feedback System
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
import json
|
| 7 |
+
from fine import ProgrammingEducationAI, ComprehensiveFeedback
|
| 8 |
+
import streamlit as st
|
| 9 |
+
import torch
|
| 10 |
+
import os
|
| 11 |
+
import gc
|
| 12 |
+
import warnings
|
| 13 |
+
warnings.filterwarnings("ignore", category=UserWarning)
|
| 14 |
+
|
| 15 |
+
# Environment setup for HF Spaces
|
| 16 |
+
os.environ["TOKENIZERS_PARALLELISM"] = "false"
|
| 17 |
+
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:128"
|
| 18 |
+
os.environ["DATASETS_DISABLE_MULTIPROCESSING"] = "1"
|
| 19 |
+
|
| 20 |
+
# Clear CUDA cache if available
|
| 21 |
+
if torch.cuda.is_available():
|
| 22 |
+
torch.cuda.empty_cache()
|
| 23 |
+
gc.collect()
|
| 24 |
+
|
| 25 |
+
|
| 26 |
+
def main():
|
| 27 |
+
st.set_page_config(
|
| 28 |
+
page_title="AI Programming Tutor",
|
| 29 |
+
page_icon="🎓",
|
| 30 |
+
layout="wide",
|
| 31 |
+
initial_sidebar_state="expanded"
|
| 32 |
+
)
|
| 33 |
+
|
| 34 |
+
st.title("🎓 AI Programming Tutor")
|
| 35 |
+
st.subheader("Comprehensive Educational Feedback System")
|
| 36 |
+
st.markdown("---")
|
| 37 |
+
|
| 38 |
+
# Sidebar configuration
|
| 39 |
+
with st.sidebar:
|
| 40 |
+
st.header("⚙️ Configuration")
|
| 41 |
+
|
| 42 |
+
# Model selection
|
| 43 |
+
model_option = st.selectbox(
|
| 44 |
+
"Choose Model:",
|
| 45 |
+
["Use Demo Mode", "Use Fine-tuned Model"],
|
| 46 |
+
help="Demo mode works immediately. Fine-tuned model requires loading (5-10 minutes on HF Spaces)."
|
| 47 |
+
)
|
| 48 |
+
|
| 49 |
+
# Student level selection
|
| 50 |
+
student_level = st.selectbox(
|
| 51 |
+
"Student Level:",
|
| 52 |
+
["beginner", "intermediate", "advanced"],
|
| 53 |
+
help="Adjusts feedback complexity and learning objectives"
|
| 54 |
+
)
|
| 55 |
+
|
| 56 |
+
# Memory info for HF Spaces
|
| 57 |
+
if st.checkbox("Show System Info"):
|
| 58 |
+
import psutil
|
| 59 |
+
memory = psutil.virtual_memory()
|
| 60 |
+
st.metric("Available RAM",
|
| 61 |
+
f"{memory.available / (1024**3):.1f} GB")
|
| 62 |
+
st.metric("RAM Usage", f"{memory.percent}%")
|
| 63 |
+
st.metric("CPU Cores", psutil.cpu_count())
|
| 64 |
+
|
| 65 |
+
# HF Spaces specific instructions
|
| 66 |
+
st.markdown("---")
|
| 67 |
+
st.markdown("### 🚀 Hugging Face Spaces")
|
| 68 |
+
st.info("""
|
| 69 |
+
**Hardware**: 2 vCPU, 16GB RAM (FREE)
|
| 70 |
+
|
| 71 |
+
**Recommendations**:
|
| 72 |
+
- Use Demo Mode for quick testing
|
| 73 |
+
- Fine-tuned model takes 5-10 minutes to load
|
| 74 |
+
- 16GB RAM is sufficient for your model
|
| 75 |
+
""")
|
| 76 |
+
|
| 77 |
+
# Main content area
|
| 78 |
+
col1, col2 = st.columns([1, 1])
|
| 79 |
+
|
| 80 |
+
with col1:
|
| 81 |
+
st.header("📝 Student Code Input")
|
| 82 |
+
|
| 83 |
+
# Code input
|
| 84 |
+
student_code = st.text_area(
|
| 85 |
+
"Paste your Python code here:",
|
| 86 |
+
height=300,
|
| 87 |
+
placeholder="""# Example code to test:
|
| 88 |
+
def find_duplicates(numbers):
|
| 89 |
+
x = []
|
| 90 |
+
for i in range(len(numbers)):
|
| 91 |
+
for j in range(i+1, len(numbers)):
|
| 92 |
+
if numbers[i] == numbers[j]:
|
| 93 |
+
x.append(numbers[i])
|
| 94 |
+
return x
|
| 95 |
+
|
| 96 |
+
# Test the function
|
| 97 |
+
result = find_duplicates([1, 2, 3, 2, 4, 5, 3])
|
| 98 |
+
print(result)""",
|
| 99 |
+
help="Paste your Python code here for analysis"
|
| 100 |
+
)
|
| 101 |
+
|
| 102 |
+
# Generate feedback button
|
| 103 |
+
if st.button("🎯 Generate Comprehensive Feedback", type="primary"):
|
| 104 |
+
if not student_code.strip():
|
| 105 |
+
st.warning("⚠️ Please enter some code first!")
|
| 106 |
+
else:
|
| 107 |
+
generate_feedback(student_code, student_level, model_option)
|
| 108 |
+
|
| 109 |
+
with col2:
|
| 110 |
+
st.header("📊 Feedback Results")
|
| 111 |
+
|
| 112 |
+
if 'feedback' in st.session_state:
|
| 113 |
+
display_feedback(st.session_state['feedback'])
|
| 114 |
+
|
| 115 |
+
|
| 116 |
+
def generate_feedback(code: str, student_level: str, model_option: str):
|
| 117 |
+
"""Generate comprehensive feedback using the AI tutor or demo mode"""
|
| 118 |
+
with st.spinner("🤖 Analyzing your code..."):
|
| 119 |
+
try:
|
| 120 |
+
if model_option == "Use Fine-tuned Model":
|
| 121 |
+
# Check if model is already loaded
|
| 122 |
+
if 'ai_tutor' not in st.session_state:
|
| 123 |
+
with st.spinner("🚀 Loading fine-tuned model (this may take 5-10 minutes on HF Spaces)..."):
|
| 124 |
+
try:
|
| 125 |
+
# Use relative path for HF Spaces
|
| 126 |
+
model_path = "./model" # Will be updated when model is uploaded
|
| 127 |
+
ai_tutor = ProgrammingEducationAI(model_path)
|
| 128 |
+
ai_tutor.load_model()
|
| 129 |
+
st.session_state['ai_tutor'] = ai_tutor
|
| 130 |
+
st.success(
|
| 131 |
+
"✅ Fine-tuned model loaded successfully!")
|
| 132 |
+
except Exception as e:
|
| 133 |
+
st.error(f"❌ Error loading model: {e}")
|
| 134 |
+
st.info("💡 Switching to demo mode...")
|
| 135 |
+
model_option = "Use Demo Mode"
|
| 136 |
+
|
| 137 |
+
if 'ai_tutor' in st.session_state:
|
| 138 |
+
# Use fine-tuned model
|
| 139 |
+
feedback = st.session_state['ai_tutor'].generate_comprehensive_feedback(
|
| 140 |
+
code, student_level)
|
| 141 |
+
st.session_state['feedback'] = feedback
|
| 142 |
+
st.success("✅ Feedback generated using fine-tuned model!")
|
| 143 |
+
else:
|
| 144 |
+
# Fallback to demo mode
|
| 145 |
+
feedback = create_demo_feedback(code, student_level)
|
| 146 |
+
st.session_state['feedback'] = feedback
|
| 147 |
+
st.success("✅ Demo feedback generated as fallback!")
|
| 148 |
+
else:
|
| 149 |
+
# Demo mode
|
| 150 |
+
feedback = create_demo_feedback(code, student_level)
|
| 151 |
+
st.session_state['feedback'] = feedback
|
| 152 |
+
st.success("✅ Demo feedback generated!")
|
| 153 |
+
except Exception as e:
|
| 154 |
+
st.error(f"❌ Error generating feedback: {e}")
|
| 155 |
+
# Fallback to demo mode
|
| 156 |
+
feedback = create_demo_feedback(code, student_level)
|
| 157 |
+
st.session_state['feedback'] = feedback
|
| 158 |
+
st.success("✅ Demo feedback generated as fallback!")
|
| 159 |
+
|
| 160 |
+
|
| 161 |
+
def create_demo_feedback(code: str, student_level: str) -> ComprehensiveFeedback:
|
| 162 |
+
"""Create demo feedback for testing without model"""
|
| 163 |
+
return ComprehensiveFeedback(
|
| 164 |
+
code_snippet=code,
|
| 165 |
+
student_level=student_level,
|
| 166 |
+
strengths=[
|
| 167 |
+
"Your code has a clear structure and logic",
|
| 168 |
+
"You're using appropriate Python syntax",
|
| 169 |
+
"The function name is descriptive"
|
| 170 |
+
],
|
| 171 |
+
weaknesses=[
|
| 172 |
+
"Variable names could be more descriptive",
|
| 173 |
+
"Missing comments explaining the logic",
|
| 174 |
+
"Could benefit from error handling"
|
| 175 |
+
],
|
| 176 |
+
issues=[
|
| 177 |
+
"Using generic variable names (x, i, j)",
|
| 178 |
+
"No input validation",
|
| 179 |
+
"Nested loops could be optimized"
|
| 180 |
+
],
|
| 181 |
+
step_by_step_improvement=[
|
| 182 |
+
"Step 1: Replace 'x' with 'duplicates' for better readability",
|
| 183 |
+
"Step 2: Add comments explaining the nested loop logic",
|
| 184 |
+
"Step 3: Consider using a set for O(n) time complexity",
|
| 185 |
+
"Step 4: Add input validation for edge cases"
|
| 186 |
+
],
|
| 187 |
+
learning_points=[
|
| 188 |
+
"Good variable naming improves code readability and maintainability",
|
| 189 |
+
"Comments help others (and yourself) understand complex logic",
|
| 190 |
+
"Algorithm complexity matters - O(n²) vs O(n) can make a huge difference",
|
| 191 |
+
"Always consider edge cases and input validation"
|
| 192 |
+
],
|
| 193 |
+
review_summary="Your code works correctly but could be improved with better naming, comments, and optimization. The logic is sound for a beginner level.",
|
| 194 |
+
comprehension_question="What is the time complexity of your current algorithm and how could you improve it?",
|
| 195 |
+
comprehension_answer="The current algorithm has O(n²) time complexity due to nested loops. It could be improved to O(n) using a hash set.",
|
| 196 |
+
explanation="Nested loops multiply their complexities. Using a set allows us to check for duplicates in O(1) time per element.",
|
| 197 |
+
improved_code="""def find_duplicates(numbers):
|
| 198 |
+
# Use a set for O(n) time complexity
|
| 199 |
+
duplicates = []
|
| 200 |
+
seen = set()
|
| 201 |
+
|
| 202 |
+
for num in numbers:
|
| 203 |
+
if num in seen:
|
| 204 |
+
duplicates.append(num)
|
| 205 |
+
else:
|
| 206 |
+
seen.add(num)
|
| 207 |
+
|
| 208 |
+
return duplicates
|
| 209 |
+
|
| 210 |
+
# Test the function
|
| 211 |
+
result = find_duplicates([1, 2, 3, 2, 4, 5, 3])
|
| 212 |
+
print(result)""",
|
| 213 |
+
fix_explanation="The improved version uses a set to track seen numbers, reducing time complexity from O(n²) to O(n) and making the code more readable with better variable names.",
|
| 214 |
+
difficulty_level=student_level,
|
| 215 |
+
learning_objectives=["algorithm_complexity",
|
| 216 |
+
"code_readability", "best_practices"],
|
| 217 |
+
estimated_time_to_improve="10-15 minutes"
|
| 218 |
+
)
|
| 219 |
+
|
| 220 |
+
|
| 221 |
+
def display_feedback(feedback: ComprehensiveFeedback):
|
| 222 |
+
"""Display comprehensive feedback in a progressive learning flow"""
|
| 223 |
+
|
| 224 |
+
# Initialize session state for tracking progress
|
| 225 |
+
if 'quiz_completed' not in st.session_state:
|
| 226 |
+
st.session_state['quiz_completed'] = False
|
| 227 |
+
if 'current_step' not in st.session_state:
|
| 228 |
+
st.session_state['current_step'] = 1
|
| 229 |
+
|
| 230 |
+
# Progress indicator
|
| 231 |
+
st.markdown("### 🎯 Learning Progress")
|
| 232 |
+
progress_bar = st.progress(0)
|
| 233 |
+
|
| 234 |
+
# Calculate progress based on current step
|
| 235 |
+
if st.session_state['current_step'] == 1:
|
| 236 |
+
progress_bar.progress(20)
|
| 237 |
+
elif st.session_state['current_step'] == 2:
|
| 238 |
+
progress_bar.progress(40)
|
| 239 |
+
elif st.session_state['current_step'] == 3:
|
| 240 |
+
progress_bar.progress(60)
|
| 241 |
+
elif st.session_state['current_step'] == 4:
|
| 242 |
+
progress_bar.progress(80)
|
| 243 |
+
elif st.session_state['current_step'] == 5:
|
| 244 |
+
progress_bar.progress(100)
|
| 245 |
+
|
| 246 |
+
# Step 1: Analysis (Always available)
|
| 247 |
+
if st.session_state['current_step'] >= 1:
|
| 248 |
+
st.markdown("### 📊 Step 1: Code Analysis")
|
| 249 |
+
|
| 250 |
+
col1, col2, col3 = st.columns(3)
|
| 251 |
+
|
| 252 |
+
with col1:
|
| 253 |
+
st.markdown("#### ✅ Strengths")
|
| 254 |
+
for i, strength in enumerate(feedback.strengths, 1):
|
| 255 |
+
st.markdown(f"**{i}.** {strength}")
|
| 256 |
+
|
| 257 |
+
with col2:
|
| 258 |
+
st.markdown("#### ❌ Weaknesses")
|
| 259 |
+
for i, weakness in enumerate(feedback.weaknesses, 1):
|
| 260 |
+
st.markdown(f"**{i}.** {weakness}")
|
| 261 |
+
|
| 262 |
+
with col3:
|
| 263 |
+
st.markdown("#### ⚠️ Issues")
|
| 264 |
+
for i, issue in enumerate(feedback.issues, 1):
|
| 265 |
+
st.markdown(f"**{i}.** {issue}")
|
| 266 |
+
|
| 267 |
+
st.markdown("#### 📋 Review Summary")
|
| 268 |
+
st.info(feedback.review_summary)
|
| 269 |
+
|
| 270 |
+
if st.session_state['current_step'] == 1:
|
| 271 |
+
if st.button("✅ I understand the analysis - Continue to Step 2", type="primary"):
|
| 272 |
+
st.session_state['current_step'] = 2
|
| 273 |
+
st.rerun()
|
| 274 |
+
|
| 275 |
+
# Step 2: Improvement Guide (Available after Step 1)
|
| 276 |
+
if st.session_state['current_step'] >= 2:
|
| 277 |
+
st.markdown("---")
|
| 278 |
+
st.markdown("### 📝 Step 2: Improvement Guide")
|
| 279 |
+
|
| 280 |
+
st.markdown("#### Step-by-Step Instructions")
|
| 281 |
+
for i, step in enumerate(feedback.step_by_step_improvement, 1):
|
| 282 |
+
st.markdown(f"**Step {i}:** {step}")
|
| 283 |
+
|
| 284 |
+
st.markdown("---")
|
| 285 |
+
st.markdown(
|
| 286 |
+
f"**⏱️ Estimated time to improve:** {feedback.estimated_time_to_improve}")
|
| 287 |
+
|
| 288 |
+
if st.session_state['current_step'] == 2:
|
| 289 |
+
if st.button("✅ I understand the improvement steps - Continue to Step 3", type="primary"):
|
| 290 |
+
st.session_state['current_step'] = 3
|
| 291 |
+
st.rerun()
|
| 292 |
+
|
| 293 |
+
# Step 3: Learning Points (Available after Step 2)
|
| 294 |
+
if st.session_state['current_step'] >= 3:
|
| 295 |
+
st.markdown("---")
|
| 296 |
+
st.markdown("### 🎓 Step 3: Learning Points")
|
| 297 |
+
|
| 298 |
+
st.markdown("#### Key Concepts to Understand")
|
| 299 |
+
for i, point in enumerate(feedback.learning_points, 1):
|
| 300 |
+
st.markdown(f"**{i}.** {point}")
|
| 301 |
+
|
| 302 |
+
st.markdown("---")
|
| 303 |
+
st.markdown("#### 🎯 Learning Objectives")
|
| 304 |
+
for objective in feedback.learning_objectives:
|
| 305 |
+
st.markdown(f"• {objective}")
|
| 306 |
+
|
| 307 |
+
if st.session_state['current_step'] == 3:
|
| 308 |
+
if st.button("✅ I understand the learning points - Continue to Step 4", type="primary"):
|
| 309 |
+
st.session_state['current_step'] = 4
|
| 310 |
+
st.rerun()
|
| 311 |
+
|
| 312 |
+
# Step 4: Comprehension Quiz (Available after Step 3)
|
| 313 |
+
if st.session_state['current_step'] >= 4:
|
| 314 |
+
st.markdown("---")
|
| 315 |
+
st.markdown("### ❓ Step 4: Comprehension Check")
|
| 316 |
+
|
| 317 |
+
st.markdown(
|
| 318 |
+
"**Before you see the solution, let's test your understanding:**")
|
| 319 |
+
st.markdown(f"**Question:** {feedback.comprehension_question}")
|
| 320 |
+
|
| 321 |
+
# Quiz interface
|
| 322 |
+
user_answer = st.text_area(
|
| 323 |
+
"Your answer:",
|
| 324 |
+
placeholder="Type your answer here...",
|
| 325 |
+
height=100,
|
| 326 |
+
key="quiz_answer"
|
| 327 |
+
)
|
| 328 |
+
|
| 329 |
+
if st.button("Check My Answer", type="primary"):
|
| 330 |
+
if user_answer.strip():
|
| 331 |
+
st.markdown("**Correct Answer:**")
|
| 332 |
+
st.success(feedback.comprehension_answer)
|
| 333 |
+
st.markdown("**Explanation:**")
|
| 334 |
+
st.info(feedback.explanation)
|
| 335 |
+
|
| 336 |
+
if not st.session_state['quiz_completed']:
|
| 337 |
+
st.session_state['quiz_completed'] = True
|
| 338 |
+
st.session_state['current_step'] = 5
|
| 339 |
+
st.rerun()
|
| 340 |
+
else:
|
| 341 |
+
st.warning("Please provide an answer first!")
|
| 342 |
+
|
| 343 |
+
# Step 5: Code Fix (Only available after completing quiz)
|
| 344 |
+
if st.session_state['current_step'] >= 5 and st.session_state['quiz_completed']:
|
| 345 |
+
st.markdown("---")
|
| 346 |
+
st.markdown("### 🔧 Step 5: Improved Code Solution")
|
| 347 |
+
|
| 348 |
+
st.markdown(
|
| 349 |
+
"🎉 **Congratulations! You've completed the learning process. Here's the improved version:**")
|
| 350 |
+
|
| 351 |
+
st.markdown("#### 🔧 Enhanced Version")
|
| 352 |
+
st.code(feedback.improved_code, language="python")
|
| 353 |
+
|
| 354 |
+
st.markdown("#### 💡 What Changed")
|
| 355 |
+
st.info(feedback.fix_explanation)
|
| 356 |
+
|
| 357 |
+
# Reset button for new analysis
|
| 358 |
+
if st.button("🔄 Analyze New Code", type="secondary"):
|
| 359 |
+
st.session_state['current_step'] = 1
|
| 360 |
+
st.session_state['quiz_completed'] = False
|
| 361 |
+
if 'feedback' in st.session_state:
|
| 362 |
+
del st.session_state['feedback']
|
| 363 |
+
st.rerun()
|
| 364 |
+
|
| 365 |
+
# Display metadata
|
| 366 |
+
st.markdown("---")
|
| 367 |
+
col1, col2, col3 = st.columns(3)
|
| 368 |
+
with col1:
|
| 369 |
+
st.metric("Student Level", feedback.student_level.title())
|
| 370 |
+
with col2:
|
| 371 |
+
st.metric("Learning Objectives", len(feedback.learning_objectives))
|
| 372 |
+
with col3:
|
| 373 |
+
st.metric("Issues Found", len(feedback.issues))
|
| 374 |
+
|
| 375 |
+
|
| 376 |
+
if __name__ == "__main__":
|
| 377 |
+
main()
|
src/config.py
ADDED
|
@@ -0,0 +1,126 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Configuration file for the Generative AI Programming Education project
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
import os
|
| 6 |
+
from pathlib import Path
|
| 7 |
+
|
| 8 |
+
# Model Configuration
|
| 9 |
+
MODEL_CONFIG = {
|
| 10 |
+
# Path to your fine-tuned CodeLlama-7B model
|
| 11 |
+
"model_path": "./model", # For Hugging Face Spaces deployment
|
| 12 |
+
|
| 13 |
+
# Model generation parameters
|
| 14 |
+
"max_new_tokens": 512,
|
| 15 |
+
"temperature": 0.7,
|
| 16 |
+
"do_sample": True,
|
| 17 |
+
"top_p": 0.9,
|
| 18 |
+
"top_k": 50,
|
| 19 |
+
|
| 20 |
+
# Input processing
|
| 21 |
+
"max_input_length": 2048,
|
| 22 |
+
"truncation": True,
|
| 23 |
+
|
| 24 |
+
# Device configuration
|
| 25 |
+
"device_map": "auto",
|
| 26 |
+
"torch_dtype": "float16",
|
| 27 |
+
"trust_remote_code": True
|
| 28 |
+
}
|
| 29 |
+
|
| 30 |
+
# Dataset Configuration (for reference)
|
| 31 |
+
DATASET_CONFIG = {
|
| 32 |
+
"code_review_dataset": "path/to/your/code_review_dataset",
|
| 33 |
+
"code_feedback_dataset": "path/to/your/code_feedback_dataset",
|
| 34 |
+
"training_data_format": "json", # or "csv", "txt"
|
| 35 |
+
}
|
| 36 |
+
|
| 37 |
+
# Educational Levels
|
| 38 |
+
STUDENT_LEVELS = {
|
| 39 |
+
"beginner": {
|
| 40 |
+
"description": "Students new to programming",
|
| 41 |
+
"feedback_style": "explanatory",
|
| 42 |
+
"include_basics": True,
|
| 43 |
+
"complexity_threshold": "low"
|
| 44 |
+
},
|
| 45 |
+
"intermediate": {
|
| 46 |
+
"description": "Students with basic programming knowledge",
|
| 47 |
+
"feedback_style": "balanced",
|
| 48 |
+
"include_basics": False,
|
| 49 |
+
"complexity_threshold": "medium"
|
| 50 |
+
},
|
| 51 |
+
"advanced": {
|
| 52 |
+
"description": "Students with strong programming background",
|
| 53 |
+
"feedback_style": "technical",
|
| 54 |
+
"include_basics": False,
|
| 55 |
+
"complexity_threshold": "high"
|
| 56 |
+
}
|
| 57 |
+
}
|
| 58 |
+
|
| 59 |
+
# Feedback Types
|
| 60 |
+
FEEDBACK_TYPES = [
|
| 61 |
+
"syntax",
|
| 62 |
+
"logic",
|
| 63 |
+
"optimization",
|
| 64 |
+
"style",
|
| 65 |
+
"explanation",
|
| 66 |
+
"comprehensive_review",
|
| 67 |
+
"educational_guidance"
|
| 68 |
+
]
|
| 69 |
+
|
| 70 |
+
# Learning Objectives
|
| 71 |
+
LEARNING_OBJECTIVES = [
|
| 72 |
+
"syntax",
|
| 73 |
+
"basic_python",
|
| 74 |
+
"control_flow",
|
| 75 |
+
"loops",
|
| 76 |
+
"variables",
|
| 77 |
+
"code_cleanliness",
|
| 78 |
+
"algorithms",
|
| 79 |
+
"complexity",
|
| 80 |
+
"optimization",
|
| 81 |
+
"naming_conventions",
|
| 82 |
+
"readability",
|
| 83 |
+
"code_analysis",
|
| 84 |
+
"best_practices",
|
| 85 |
+
"learning",
|
| 86 |
+
"improvement"
|
| 87 |
+
]
|
| 88 |
+
|
| 89 |
+
# Logging Configuration
|
| 90 |
+
LOGGING_CONFIG = {
|
| 91 |
+
"level": "INFO",
|
| 92 |
+
"format": "%(asctime)s - %(name)s - %(levelname)s - %(message)s",
|
| 93 |
+
"file": "programming_education_ai.log"
|
| 94 |
+
}
|
| 95 |
+
|
| 96 |
+
# Ethical Safeguards
|
| 97 |
+
ETHICAL_CONFIG = {
|
| 98 |
+
"prevent_over_reliance": True,
|
| 99 |
+
"encourage_learning": True,
|
| 100 |
+
"provide_explanations": True,
|
| 101 |
+
"suggest_alternatives": True,
|
| 102 |
+
"promote_best_practices": True
|
| 103 |
+
}
|
| 104 |
+
|
| 105 |
+
|
| 106 |
+
def get_model_path():
|
| 107 |
+
"""Get the model path from environment variable or config"""
|
| 108 |
+
return os.getenv("FINETUNED_MODEL_PATH", MODEL_CONFIG["model_path"])
|
| 109 |
+
|
| 110 |
+
|
| 111 |
+
def validate_config():
|
| 112 |
+
"""Validate the configuration settings"""
|
| 113 |
+
model_path = get_model_path()
|
| 114 |
+
if not os.path.exists(model_path):
|
| 115 |
+
print(f"Warning: Model path does not exist: {model_path}")
|
| 116 |
+
print("Please update the model_path in config.py or set FINETUNED_MODEL_PATH environment variable")
|
| 117 |
+
return False
|
| 118 |
+
return True
|
| 119 |
+
|
| 120 |
+
|
| 121 |
+
if __name__ == "__main__":
|
| 122 |
+
print("Configuration loaded successfully!")
|
| 123 |
+
print(f"Model path: {get_model_path()}")
|
| 124 |
+
print(f"Student levels: {list(STUDENT_LEVELS.keys())}")
|
| 125 |
+
print(f"Feedback types: {FEEDBACK_TYPES}")
|
| 126 |
+
validate_config()
|
src/example_usage.py
ADDED
|
@@ -0,0 +1,186 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Example Usage of the Comprehensive Educational Feedback System
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
from fine import ProgrammingEducationAI
|
| 6 |
+
import json
|
| 7 |
+
|
| 8 |
+
|
| 9 |
+
def main():
|
| 10 |
+
print("🎓 Comprehensive Educational Feedback System")
|
| 11 |
+
print("=" * 60)
|
| 12 |
+
|
| 13 |
+
# Initialize the system
|
| 14 |
+
# Update this path to your actual fine-tuned model
|
| 15 |
+
model_path = r"C:\Users\farou\OneDrive - Aston University\finetunning"
|
| 16 |
+
ai_tutor = ProgrammingEducationAI(model_path)
|
| 17 |
+
|
| 18 |
+
try:
|
| 19 |
+
# Load the model
|
| 20 |
+
print("Loading fine-tuned model...")
|
| 21 |
+
ai_tutor.load_model()
|
| 22 |
+
print("✅ Model loaded successfully!")
|
| 23 |
+
|
| 24 |
+
# Example 1: Beginner student code
|
| 25 |
+
print("\n" + "="*60)
|
| 26 |
+
print("EXAMPLE 1: BEGINNER STUDENT")
|
| 27 |
+
print("="*60)
|
| 28 |
+
|
| 29 |
+
beginner_code = """
|
| 30 |
+
def find_duplicates(numbers):
|
| 31 |
+
x = []
|
| 32 |
+
for i in range(len(numbers)):
|
| 33 |
+
for j in range(i+1, len(numbers)):
|
| 34 |
+
if numbers[i] == numbers[j]:
|
| 35 |
+
x.append(numbers[i])
|
| 36 |
+
return x
|
| 37 |
+
|
| 38 |
+
result = find_duplicates([1, 2, 3, 2, 4, 5, 3])
|
| 39 |
+
print(result)
|
| 40 |
+
"""
|
| 41 |
+
|
| 42 |
+
print("Student Code:")
|
| 43 |
+
print(beginner_code)
|
| 44 |
+
|
| 45 |
+
feedback = ai_tutor.generate_comprehensive_feedback(
|
| 46 |
+
beginner_code, "beginner")
|
| 47 |
+
display_comprehensive_feedback(feedback)
|
| 48 |
+
|
| 49 |
+
# Example 2: Intermediate student code
|
| 50 |
+
print("\n" + "="*60)
|
| 51 |
+
print("EXAMPLE 2: INTERMEDIATE STUDENT")
|
| 52 |
+
print("="*60)
|
| 53 |
+
|
| 54 |
+
intermediate_code = """
|
| 55 |
+
def fibonacci(n):
|
| 56 |
+
if n <= 1:
|
| 57 |
+
return n
|
| 58 |
+
return fibonacci(n-1) + fibonacci(n-2)
|
| 59 |
+
|
| 60 |
+
# Calculate first 10 Fibonacci numbers
|
| 61 |
+
for i in range(10):
|
| 62 |
+
print(fibonacci(i))
|
| 63 |
+
"""
|
| 64 |
+
|
| 65 |
+
print("Student Code:")
|
| 66 |
+
print(intermediate_code)
|
| 67 |
+
|
| 68 |
+
feedback = ai_tutor.generate_comprehensive_feedback(
|
| 69 |
+
intermediate_code, "intermediate")
|
| 70 |
+
display_comprehensive_feedback(feedback)
|
| 71 |
+
|
| 72 |
+
# Example 3: Advanced student code
|
| 73 |
+
print("\n" + "="*60)
|
| 74 |
+
print("EXAMPLE 3: ADVANCED STUDENT")
|
| 75 |
+
print("="*60)
|
| 76 |
+
|
| 77 |
+
advanced_code = """
|
| 78 |
+
class DataProcessor:
|
| 79 |
+
def __init__(self, data):
|
| 80 |
+
self.data = data
|
| 81 |
+
|
| 82 |
+
def process(self):
|
| 83 |
+
result = []
|
| 84 |
+
for item in self.data:
|
| 85 |
+
if item > 0:
|
| 86 |
+
result.append(item * 2)
|
| 87 |
+
return result
|
| 88 |
+
|
| 89 |
+
processor = DataProcessor([1, -2, 3, -4, 5])
|
| 90 |
+
output = processor.process()
|
| 91 |
+
print(output)
|
| 92 |
+
"""
|
| 93 |
+
|
| 94 |
+
print("Student Code:")
|
| 95 |
+
print(advanced_code)
|
| 96 |
+
|
| 97 |
+
feedback = ai_tutor.generate_comprehensive_feedback(
|
| 98 |
+
advanced_code, "advanced")
|
| 99 |
+
display_comprehensive_feedback(feedback)
|
| 100 |
+
|
| 101 |
+
except Exception as e:
|
| 102 |
+
print(f"❌ Error: {e}")
|
| 103 |
+
print(
|
| 104 |
+
"💡 Make sure to update the model_path to point to your actual fine-tuned model.")
|
| 105 |
+
|
| 106 |
+
|
| 107 |
+
def display_comprehensive_feedback(feedback):
|
| 108 |
+
"""Display comprehensive feedback in a formatted way"""
|
| 109 |
+
|
| 110 |
+
print("\n📊 COMPREHENSIVE FEEDBACK")
|
| 111 |
+
print("-" * 40)
|
| 112 |
+
|
| 113 |
+
# Analysis
|
| 114 |
+
print("\n✅ STRENGTHS:")
|
| 115 |
+
for i, strength in enumerate(feedback.strengths, 1):
|
| 116 |
+
print(f" {i}. {strength}")
|
| 117 |
+
|
| 118 |
+
print("\n❌ WEAKNESSES:")
|
| 119 |
+
for i, weakness in enumerate(feedback.weaknesses, 1):
|
| 120 |
+
print(f" {i}. {weakness}")
|
| 121 |
+
|
| 122 |
+
print("\n⚠️ ISSUES:")
|
| 123 |
+
for i, issue in enumerate(feedback.issues, 1):
|
| 124 |
+
print(f" {i}. {issue}")
|
| 125 |
+
|
| 126 |
+
# Educational content
|
| 127 |
+
print("\n📝 STEP-BY-STEP IMPROVEMENT:")
|
| 128 |
+
for i, step in enumerate(feedback.step_by_step_improvement, 1):
|
| 129 |
+
print(f" Step {i}: {step}")
|
| 130 |
+
|
| 131 |
+
print("\n🎓 LEARNING POINTS:")
|
| 132 |
+
for i, point in enumerate(feedback.learning_points, 1):
|
| 133 |
+
print(f" {i}. {point}")
|
| 134 |
+
|
| 135 |
+
print(f"\n📋 REVIEW SUMMARY:")
|
| 136 |
+
print(f" {feedback.review_summary}")
|
| 137 |
+
|
| 138 |
+
# Interactive elements
|
| 139 |
+
print(f"\n❓ COMPREHENSION QUESTION:")
|
| 140 |
+
print(f" Q: {feedback.comprehension_question}")
|
| 141 |
+
print(f" A: {feedback.comprehension_answer}")
|
| 142 |
+
print(f" Explanation: {feedback.explanation}")
|
| 143 |
+
|
| 144 |
+
# Code fixes
|
| 145 |
+
print(f"\n🔧 IMPROVED CODE:")
|
| 146 |
+
print(feedback.improved_code)
|
| 147 |
+
|
| 148 |
+
print(f"\n💡 FIX EXPLANATION:")
|
| 149 |
+
print(f" {feedback.fix_explanation}")
|
| 150 |
+
|
| 151 |
+
# Metadata
|
| 152 |
+
print(f"\n📊 METADATA:")
|
| 153 |
+
print(f" Student Level: {feedback.student_level}")
|
| 154 |
+
print(f" Learning Objectives: {', '.join(feedback.learning_objectives)}")
|
| 155 |
+
print(
|
| 156 |
+
f" Estimated Time to Improve: {feedback.estimated_time_to_improve}")
|
| 157 |
+
|
| 158 |
+
|
| 159 |
+
def save_feedback_to_json(feedback, filename):
|
| 160 |
+
"""Save feedback to JSON file for later analysis"""
|
| 161 |
+
feedback_dict = {
|
| 162 |
+
"code_snippet": feedback.code_snippet,
|
| 163 |
+
"student_level": feedback.student_level,
|
| 164 |
+
"strengths": feedback.strengths,
|
| 165 |
+
"weaknesses": feedback.weaknesses,
|
| 166 |
+
"issues": feedback.issues,
|
| 167 |
+
"step_by_step_improvement": feedback.step_by_step_improvement,
|
| 168 |
+
"learning_points": feedback.learning_points,
|
| 169 |
+
"review_summary": feedback.review_summary,
|
| 170 |
+
"comprehension_question": feedback.comprehension_question,
|
| 171 |
+
"comprehension_answer": feedback.comprehension_answer,
|
| 172 |
+
"explanation": feedback.explanation,
|
| 173 |
+
"improved_code": feedback.improved_code,
|
| 174 |
+
"fix_explanation": feedback.fix_explanation,
|
| 175 |
+
"learning_objectives": feedback.learning_objectives,
|
| 176 |
+
"estimated_time_to_improve": feedback.estimated_time_to_improve
|
| 177 |
+
}
|
| 178 |
+
|
| 179 |
+
with open(filename, 'w') as f:
|
| 180 |
+
json.dump(feedback_dict, f, indent=2)
|
| 181 |
+
|
| 182 |
+
print(f"💾 Feedback saved to {filename}")
|
| 183 |
+
|
| 184 |
+
|
| 185 |
+
if __name__ == "__main__":
|
| 186 |
+
main()
|
src/fine.py
ADDED
|
@@ -0,0 +1,945 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Generative AI for Enhancing Programming Education
|
| 3 |
+
================================================
|
| 4 |
+
|
| 5 |
+
This project implements a fine-tuned CodeLlama-7B model to provide structured,
|
| 6 |
+
educational code feedback for programming students.
|
| 7 |
+
|
| 8 |
+
Problem Statement:
|
| 9 |
+
- High dropout rates in programming education
|
| 10 |
+
- Inefficient feedback loops
|
| 11 |
+
- Lack of personalized learning
|
| 12 |
+
- Limited instructor bandwidth
|
| 13 |
+
- Current AI tools prioritize productivity over learning
|
| 14 |
+
|
| 15 |
+
Solution:
|
| 16 |
+
- Fine-tuned CodeLlama-7B for educational feedback
|
| 17 |
+
- Structured, actionable code reviews
|
| 18 |
+
- Beginner-friendly explanations
|
| 19 |
+
- Personalized adaptation based on skill level
|
| 20 |
+
- Educational focus with ethical safeguards
|
| 21 |
+
|
| 22 |
+
Author: [Your Name]
|
| 23 |
+
Date: [Current Date]
|
| 24 |
+
"""
|
| 25 |
+
|
| 26 |
+
import re
|
| 27 |
+
from dataclasses import dataclass
|
| 28 |
+
from typing import Dict, List, Optional, Tuple
|
| 29 |
+
import logging
|
| 30 |
+
import json
|
| 31 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 32 |
+
import os
|
| 33 |
+
import gc
|
| 34 |
+
import torch
|
| 35 |
+
import warnings
|
| 36 |
+
warnings.filterwarnings("ignore", category=UserWarning)
|
| 37 |
+
|
| 38 |
+
# --- Critical Environment Setup (Must be before imports) ---
|
| 39 |
+
os.environ["TOKENIZERS_PARALLELISM"] = "false"
|
| 40 |
+
os.environ["DATASETS_DISABLE_MULTIPROCESSING"] = "1"
|
| 41 |
+
|
| 42 |
+
# Clear any existing CUDA cache (only if CUDA is available)
|
| 43 |
+
if torch.cuda.is_available():
|
| 44 |
+
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:128,garbage_collection_threshold:0.6"
|
| 45 |
+
torch.cuda.empty_cache()
|
| 46 |
+
gc.collect()
|
| 47 |
+
|
| 48 |
+
|
| 49 |
+
# Configure logging
|
| 50 |
+
logging.basicConfig(level=logging.INFO)
|
| 51 |
+
logger = logging.getLogger(__name__)
|
| 52 |
+
|
| 53 |
+
|
| 54 |
+
def clear_cuda_cache():
|
| 55 |
+
"""Clear CUDA cache and run garbage collection"""
|
| 56 |
+
if torch.cuda.is_available():
|
| 57 |
+
torch.cuda.empty_cache()
|
| 58 |
+
torch.cuda.synchronize()
|
| 59 |
+
gc.collect()
|
| 60 |
+
|
| 61 |
+
|
| 62 |
+
def get_system_memory():
|
| 63 |
+
"""Get system memory information"""
|
| 64 |
+
try:
|
| 65 |
+
import psutil
|
| 66 |
+
memory = psutil.virtual_memory()
|
| 67 |
+
print(
|
| 68 |
+
f"System RAM: {memory.used / (1024**3):.1f}GB / {memory.total / (1024**3):.1f}GB used ({memory.percent:.1f}%)")
|
| 69 |
+
except Exception as e:
|
| 70 |
+
print(f"Could not get system memory info: {e}")
|
| 71 |
+
|
| 72 |
+
|
| 73 |
+
def get_gpu_memory():
|
| 74 |
+
"""Get GPU memory information (if available)"""
|
| 75 |
+
if torch.cuda.is_available():
|
| 76 |
+
try:
|
| 77 |
+
import subprocess
|
| 78 |
+
result = subprocess.run(['nvidia-smi', '--query-gpu=memory.used,memory.total', '--format=csv,nounits,noheader'],
|
| 79 |
+
capture_output=True, text=True)
|
| 80 |
+
lines = result.stdout.strip().split('\n')
|
| 81 |
+
for i, line in enumerate(lines):
|
| 82 |
+
used, total = map(int, line.split(', '))
|
| 83 |
+
print(
|
| 84 |
+
f"GPU {i}: {used}MB / {total}MB used ({used/total*100:.1f}%)")
|
| 85 |
+
except Exception as e:
|
| 86 |
+
print(f"Could not get GPU memory info: {e}")
|
| 87 |
+
else:
|
| 88 |
+
print("No GPU available - using CPU only")
|
| 89 |
+
|
| 90 |
+
|
| 91 |
+
@dataclass
|
| 92 |
+
class CodeFeedback:
|
| 93 |
+
"""Data structure for storing code feedback"""
|
| 94 |
+
code_snippet: str
|
| 95 |
+
feedback_type: str # 'syntax', 'logic', 'optimization', 'style', 'explanation'
|
| 96 |
+
feedback_message: str
|
| 97 |
+
suggested_improvement: Optional[str] = None
|
| 98 |
+
difficulty_level: str = 'beginner' # 'beginner', 'intermediate', 'advanced'
|
| 99 |
+
learning_objectives: List[str] = None
|
| 100 |
+
|
| 101 |
+
|
| 102 |
+
@dataclass
|
| 103 |
+
class ComprehensiveFeedback:
|
| 104 |
+
"""Comprehensive feedback structure with all educational components"""
|
| 105 |
+
code_snippet: str
|
| 106 |
+
student_level: str
|
| 107 |
+
|
| 108 |
+
# Analysis
|
| 109 |
+
strengths: List[str]
|
| 110 |
+
weaknesses: List[str]
|
| 111 |
+
issues: List[str]
|
| 112 |
+
|
| 113 |
+
# Educational content
|
| 114 |
+
step_by_step_improvement: List[str]
|
| 115 |
+
learning_points: List[str]
|
| 116 |
+
review_summary: str
|
| 117 |
+
|
| 118 |
+
# Interactive elements
|
| 119 |
+
comprehension_question: str
|
| 120 |
+
comprehension_answer: str
|
| 121 |
+
explanation: str
|
| 122 |
+
|
| 123 |
+
# Code fixes
|
| 124 |
+
improved_code: str
|
| 125 |
+
fix_explanation: str
|
| 126 |
+
|
| 127 |
+
# Metadata
|
| 128 |
+
difficulty_level: str
|
| 129 |
+
learning_objectives: List[str]
|
| 130 |
+
estimated_time_to_improve: str
|
| 131 |
+
|
| 132 |
+
|
| 133 |
+
class ProgrammingEducationAI:
|
| 134 |
+
"""
|
| 135 |
+
Main class for the fine-tuned CodeLlama model for programming education
|
| 136 |
+
"""
|
| 137 |
+
|
| 138 |
+
def __init__(self, model_path: str = "./model"):
|
| 139 |
+
"""
|
| 140 |
+
Initialize the fine-tuned model and tokenizer
|
| 141 |
+
|
| 142 |
+
Args:
|
| 143 |
+
model_path: Path to your fine-tuned CodeLlama-7B model
|
| 144 |
+
"""
|
| 145 |
+
self.model_path = model_path
|
| 146 |
+
self.tokenizer = None
|
| 147 |
+
self.model = None
|
| 148 |
+
self.feedback_templates = self._load_feedback_templates()
|
| 149 |
+
self.code_review_prompt_template = self._load_code_review_prompt()
|
| 150 |
+
self.code_feedback_prompt_template = self._load_code_feedback_prompt()
|
| 151 |
+
self.comprehensive_feedback_prompt = self._load_comprehensive_feedback_prompt()
|
| 152 |
+
self.comprehension_question_prompt = self._load_comprehension_question_prompt()
|
| 153 |
+
self.code_fix_prompt = self._load_code_fix_prompt()
|
| 154 |
+
|
| 155 |
+
def _load_code_review_prompt(self) -> str:
|
| 156 |
+
"""Load the code review prompt template used during fine-tuning"""
|
| 157 |
+
return """You are an expert programming tutor. Review the following student code and provide educational feedback.
|
| 158 |
+
|
| 159 |
+
Student Code:
|
| 160 |
+
{code}
|
| 161 |
+
|
| 162 |
+
Student Level: {level}
|
| 163 |
+
|
| 164 |
+
Please provide:
|
| 165 |
+
1. Syntax errors (if any)
|
| 166 |
+
2. Logic errors (if any)
|
| 167 |
+
3. Style improvements
|
| 168 |
+
4. Optimization suggestions
|
| 169 |
+
5. Educational explanations
|
| 170 |
+
|
| 171 |
+
Feedback:"""
|
| 172 |
+
|
| 173 |
+
def _load_code_feedback_prompt(self) -> str:
|
| 174 |
+
"""Load the code feedback prompt template used during fine-tuning"""
|
| 175 |
+
return """You are a helpful programming tutor. The student has written this code:
|
| 176 |
+
|
| 177 |
+
{code}
|
| 178 |
+
|
| 179 |
+
Student Level: {level}
|
| 180 |
+
|
| 181 |
+
Provide constructive, educational feedback that helps the student learn. Focus on:
|
| 182 |
+
- What they did well
|
| 183 |
+
- What can be improved
|
| 184 |
+
- Why the improvement matters
|
| 185 |
+
- How to implement the improvement
|
| 186 |
+
|
| 187 |
+
Feedback:"""
|
| 188 |
+
|
| 189 |
+
def _load_feedback_templates(self) -> Dict[str, str]:
|
| 190 |
+
"""Load predefined feedback templates for different scenarios"""
|
| 191 |
+
return {
|
| 192 |
+
"syntax_error": "I notice there's a syntax issue in your code. {error_description}. "
|
| 193 |
+
"Here's what's happening: {explanation}. "
|
| 194 |
+
"Try this correction: {suggestion}",
|
| 195 |
+
|
| 196 |
+
"logic_error": "Your code has a logical issue. {problem_description}. "
|
| 197 |
+
"The problem is: {explanation}. "
|
| 198 |
+
"Consider this approach: {suggestion}",
|
| 199 |
+
|
| 200 |
+
"optimization": "Your code works, but we can make it more efficient! "
|
| 201 |
+
"Current complexity: {current_complexity}. "
|
| 202 |
+
"Optimized version: {optimized_complexity}. "
|
| 203 |
+
"Here's how: {explanation}",
|
| 204 |
+
|
| 205 |
+
"style_improvement": "Great work! Here's a style tip: {tip}. "
|
| 206 |
+
"This makes your code more readable and maintainable.",
|
| 207 |
+
|
| 208 |
+
"concept_explanation": "Let me explain this concept: {concept}. "
|
| 209 |
+
"In simple terms: {simple_explanation}. "
|
| 210 |
+
"Example: {example}"
|
| 211 |
+
}
|
| 212 |
+
|
| 213 |
+
def load_model(self):
|
| 214 |
+
"""Load the fine-tuned model and tokenizer using optimized settings"""
|
| 215 |
+
try:
|
| 216 |
+
logger.info(f"Loading fine-tuned model from {self.model_path}")
|
| 217 |
+
|
| 218 |
+
# Load tokenizer with proper settings
|
| 219 |
+
self.tokenizer = AutoTokenizer.from_pretrained(
|
| 220 |
+
self.model_path,
|
| 221 |
+
use_fast=True,
|
| 222 |
+
padding_side="right"
|
| 223 |
+
)
|
| 224 |
+
|
| 225 |
+
# Set padding token
|
| 226 |
+
if self.tokenizer.pad_token is None:
|
| 227 |
+
self.tokenizer.pad_token = self.tokenizer.eos_token
|
| 228 |
+
self.tokenizer.pad_token_id = self.tokenizer.eos_token_id
|
| 229 |
+
|
| 230 |
+
logger.info(
|
| 231 |
+
f"Tokenizer loaded - Vocab size: {len(self.tokenizer)}")
|
| 232 |
+
|
| 233 |
+
# Load model optimized for HF Spaces (16GB RAM, 2 vCPU)
|
| 234 |
+
print("Loading model optimized for HF Spaces (16GB RAM, 2 vCPU)...")
|
| 235 |
+
self.model = AutoModelForCausalLM.from_pretrained(
|
| 236 |
+
self.model_path,
|
| 237 |
+
torch_dtype=torch.float32,
|
| 238 |
+
device_map=None, # Force CPU for HF Spaces
|
| 239 |
+
low_cpu_mem_usage=True,
|
| 240 |
+
trust_remote_code=True,
|
| 241 |
+
offload_folder="offload" # Offload to disk if needed
|
| 242 |
+
)
|
| 243 |
+
# Enable gradient checkpointing for memory savings
|
| 244 |
+
self.model.gradient_checkpointing_enable()
|
| 245 |
+
|
| 246 |
+
logger.info("Fine-tuned model loaded successfully")
|
| 247 |
+
logger.info(f"Model loaded on devices: {self.model.hf_device_map}")
|
| 248 |
+
|
| 249 |
+
except Exception as e:
|
| 250 |
+
logger.error(f"Error loading fine-tuned model: {e}")
|
| 251 |
+
raise
|
| 252 |
+
|
| 253 |
+
def generate_code_review(self, code: str, student_level: str = "beginner") -> str:
|
| 254 |
+
"""
|
| 255 |
+
Generate code review using the fine-tuned model
|
| 256 |
+
|
| 257 |
+
Args:
|
| 258 |
+
code: Student's code to review
|
| 259 |
+
student_level: Student's skill level
|
| 260 |
+
|
| 261 |
+
Returns:
|
| 262 |
+
Generated code review feedback
|
| 263 |
+
"""
|
| 264 |
+
if not self.model or not self.tokenizer:
|
| 265 |
+
raise ValueError("Model not loaded. Call load_model() first.")
|
| 266 |
+
|
| 267 |
+
# Format the prompt using the template from fine-tuning
|
| 268 |
+
prompt = self.code_review_prompt_template.format(
|
| 269 |
+
code=code,
|
| 270 |
+
level=student_level
|
| 271 |
+
)
|
| 272 |
+
|
| 273 |
+
# Tokenize input
|
| 274 |
+
inputs = self.tokenizer(
|
| 275 |
+
prompt, return_tensors="pt", truncation=True, max_length=2048)
|
| 276 |
+
|
| 277 |
+
# Generate response
|
| 278 |
+
with torch.no_grad():
|
| 279 |
+
outputs = self.model.generate(
|
| 280 |
+
inputs.input_ids,
|
| 281 |
+
max_new_tokens=512,
|
| 282 |
+
temperature=0.7,
|
| 283 |
+
do_sample=True,
|
| 284 |
+
pad_token_id=self.tokenizer.eos_token_id
|
| 285 |
+
)
|
| 286 |
+
|
| 287 |
+
# Decode response
|
| 288 |
+
response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
|
| 289 |
+
|
| 290 |
+
# Extract only the generated part (after the prompt)
|
| 291 |
+
generated_text = response[len(prompt):].strip()
|
| 292 |
+
|
| 293 |
+
return generated_text
|
| 294 |
+
|
| 295 |
+
def generate_educational_feedback(self, code: str, student_level: str = "beginner") -> str:
|
| 296 |
+
"""
|
| 297 |
+
Generate educational feedback using the fine-tuned model
|
| 298 |
+
|
| 299 |
+
Args:
|
| 300 |
+
code: Student's code to provide feedback on
|
| 301 |
+
student_level: Student's skill level
|
| 302 |
+
|
| 303 |
+
Returns:
|
| 304 |
+
Generated educational feedback
|
| 305 |
+
"""
|
| 306 |
+
if not self.model or not self.tokenizer:
|
| 307 |
+
raise ValueError("Model not loaded. Call load_model() first.")
|
| 308 |
+
|
| 309 |
+
# Format the prompt using the template from fine-tuning
|
| 310 |
+
prompt = self.code_feedback_prompt_template.format(
|
| 311 |
+
code=code,
|
| 312 |
+
level=student_level
|
| 313 |
+
)
|
| 314 |
+
|
| 315 |
+
# Tokenize input
|
| 316 |
+
inputs = self.tokenizer(
|
| 317 |
+
prompt, return_tensors="pt", truncation=True, max_length=2048)
|
| 318 |
+
|
| 319 |
+
# Generate response
|
| 320 |
+
with torch.no_grad():
|
| 321 |
+
outputs = self.model.generate(
|
| 322 |
+
inputs.input_ids,
|
| 323 |
+
max_new_tokens=512,
|
| 324 |
+
temperature=0.7,
|
| 325 |
+
do_sample=True,
|
| 326 |
+
pad_token_id=self.tokenizer.eos_token_id
|
| 327 |
+
)
|
| 328 |
+
|
| 329 |
+
# Decode response
|
| 330 |
+
response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
|
| 331 |
+
|
| 332 |
+
# Extract only the generated part (after the prompt)
|
| 333 |
+
generated_text = response[len(prompt):].strip()
|
| 334 |
+
|
| 335 |
+
return generated_text
|
| 336 |
+
|
| 337 |
+
def analyze_student_code(self, code: str, student_level: str = "beginner") -> List[CodeFeedback]:
|
| 338 |
+
"""
|
| 339 |
+
Analyze student code and provide educational feedback using the fine-tuned model
|
| 340 |
+
|
| 341 |
+
Args:
|
| 342 |
+
code: The student's code to analyze
|
| 343 |
+
student_level: Student's skill level ('beginner', 'intermediate', 'advanced')
|
| 344 |
+
|
| 345 |
+
Returns:
|
| 346 |
+
List of CodeFeedback objects
|
| 347 |
+
"""
|
| 348 |
+
feedback_list = []
|
| 349 |
+
|
| 350 |
+
# Use fine-tuned model for comprehensive code review
|
| 351 |
+
try:
|
| 352 |
+
code_review = self.generate_code_review(code, student_level)
|
| 353 |
+
educational_feedback = self.generate_educational_feedback(
|
| 354 |
+
code, student_level)
|
| 355 |
+
|
| 356 |
+
# Create structured feedback from model output
|
| 357 |
+
feedback_list.append(CodeFeedback(
|
| 358 |
+
code_snippet=code,
|
| 359 |
+
feedback_type="comprehensive_review",
|
| 360 |
+
feedback_message=code_review,
|
| 361 |
+
difficulty_level=student_level,
|
| 362 |
+
learning_objectives=["code_analysis", "best_practices"]
|
| 363 |
+
))
|
| 364 |
+
|
| 365 |
+
feedback_list.append(CodeFeedback(
|
| 366 |
+
code_snippet=code,
|
| 367 |
+
feedback_type="educational_guidance",
|
| 368 |
+
feedback_message=educational_feedback,
|
| 369 |
+
difficulty_level=student_level,
|
| 370 |
+
learning_objectives=["learning", "improvement"]
|
| 371 |
+
))
|
| 372 |
+
|
| 373 |
+
except Exception as e:
|
| 374 |
+
logger.warning(
|
| 375 |
+
f"Fine-tuned model failed, falling back to rule-based analysis: {e}")
|
| 376 |
+
# Fallback to rule-based analysis if model fails
|
| 377 |
+
feedback_list = self._fallback_analysis(code, student_level)
|
| 378 |
+
|
| 379 |
+
return feedback_list
|
| 380 |
+
|
| 381 |
+
def _fallback_analysis(self, code: str, student_level: str) -> List[CodeFeedback]:
|
| 382 |
+
"""Fallback analysis using rule-based methods if fine-tuned model fails"""
|
| 383 |
+
feedback_list = []
|
| 384 |
+
|
| 385 |
+
# Analyze syntax
|
| 386 |
+
syntax_feedback = self._check_syntax(code, student_level)
|
| 387 |
+
if syntax_feedback:
|
| 388 |
+
feedback_list.append(syntax_feedback)
|
| 389 |
+
|
| 390 |
+
# Analyze logic and structure
|
| 391 |
+
logic_feedback = self._check_logic(code, student_level)
|
| 392 |
+
if logic_feedback:
|
| 393 |
+
feedback_list.extend(logic_feedback)
|
| 394 |
+
|
| 395 |
+
# Check for optimization opportunities
|
| 396 |
+
optimization_feedback = self._check_optimization(code, student_level)
|
| 397 |
+
if optimization_feedback:
|
| 398 |
+
feedback_list.append(optimization_feedback)
|
| 399 |
+
|
| 400 |
+
# Provide style suggestions
|
| 401 |
+
style_feedback = self._check_style(code, student_level)
|
| 402 |
+
if style_feedback:
|
| 403 |
+
feedback_list.append(style_feedback)
|
| 404 |
+
|
| 405 |
+
return feedback_list
|
| 406 |
+
|
| 407 |
+
def _check_syntax(self, code: str, student_level: str) -> Optional[CodeFeedback]:
|
| 408 |
+
"""Check for syntax errors and provide educational feedback"""
|
| 409 |
+
# This would integrate with the fine-tuned model
|
| 410 |
+
# For now, using basic pattern matching as placeholder
|
| 411 |
+
|
| 412 |
+
common_syntax_errors = {
|
| 413 |
+
r"print\s*\([^)]*\)\s*$": "Remember to add a colon after print statements in some contexts",
|
| 414 |
+
r"if\s+[^:]+$": "Don't forget the colon after your if condition",
|
| 415 |
+
r"for\s+[^:]+$": "Don't forget the colon after your for loop",
|
| 416 |
+
}
|
| 417 |
+
|
| 418 |
+
for pattern, message in common_syntax_errors.items():
|
| 419 |
+
if re.search(pattern, code):
|
| 420 |
+
return CodeFeedback(
|
| 421 |
+
code_snippet=code,
|
| 422 |
+
feedback_type="syntax",
|
| 423 |
+
feedback_message=message,
|
| 424 |
+
difficulty_level=student_level,
|
| 425 |
+
learning_objectives=["syntax", "basic_python"]
|
| 426 |
+
)
|
| 427 |
+
|
| 428 |
+
return None
|
| 429 |
+
|
| 430 |
+
def _check_logic(self, code: str, student_level: str) -> List[CodeFeedback]:
|
| 431 |
+
"""Check for logical errors and provide educational feedback"""
|
| 432 |
+
feedback_list = []
|
| 433 |
+
|
| 434 |
+
# Check for infinite loops
|
| 435 |
+
if "while True:" in code and "break" not in code:
|
| 436 |
+
feedback_list.append(CodeFeedback(
|
| 437 |
+
code_snippet=code,
|
| 438 |
+
feedback_type="logic",
|
| 439 |
+
feedback_message="This while loop will run forever! Make sure to include a break statement or condition to exit the loop.",
|
| 440 |
+
difficulty_level=student_level,
|
| 441 |
+
learning_objectives=["control_flow", "loops"]
|
| 442 |
+
))
|
| 443 |
+
|
| 444 |
+
# Check for unused variables
|
| 445 |
+
# This is a simplified check - the actual model would be more sophisticated
|
| 446 |
+
if "x = " in code and "x" not in code.replace("x = ", ""):
|
| 447 |
+
feedback_list.append(CodeFeedback(
|
| 448 |
+
code_snippet=code,
|
| 449 |
+
feedback_type="logic",
|
| 450 |
+
feedback_message="You created variable 'x' but didn't use it. Consider removing unused variables to keep your code clean.",
|
| 451 |
+
difficulty_level=student_level,
|
| 452 |
+
learning_objectives=["variables", "code_cleanliness"]
|
| 453 |
+
))
|
| 454 |
+
|
| 455 |
+
return feedback_list
|
| 456 |
+
|
| 457 |
+
def _check_optimization(self, code: str, student_level: str) -> Optional[CodeFeedback]:
|
| 458 |
+
"""Check for optimization opportunities"""
|
| 459 |
+
# Check for nested loops that could be optimized
|
| 460 |
+
if code.count("for") > 1 and code.count("in range") > 1:
|
| 461 |
+
return CodeFeedback(
|
| 462 |
+
code_snippet=code,
|
| 463 |
+
feedback_type="optimization",
|
| 464 |
+
feedback_message="You have nested loops here. Consider if you can optimize this to O(n) instead of O(n²).",
|
| 465 |
+
suggested_improvement="Use a hashmap or set to reduce complexity",
|
| 466 |
+
difficulty_level=student_level,
|
| 467 |
+
learning_objectives=["algorithms",
|
| 468 |
+
"complexity", "optimization"]
|
| 469 |
+
)
|
| 470 |
+
|
| 471 |
+
return None
|
| 472 |
+
|
| 473 |
+
def _check_style(self, code: str, student_level: str) -> Optional[CodeFeedback]:
|
| 474 |
+
"""Check for style improvements"""
|
| 475 |
+
# Check for meaningful variable names
|
| 476 |
+
if "x" in code or "y" in code or "z" in code:
|
| 477 |
+
return CodeFeedback(
|
| 478 |
+
code_snippet=code,
|
| 479 |
+
feedback_type="style",
|
| 480 |
+
feedback_message="Consider using more descriptive variable names instead of x, y, z. This makes your code easier to understand.",
|
| 481 |
+
difficulty_level=student_level,
|
| 482 |
+
learning_objectives=["naming_conventions", "readability"]
|
| 483 |
+
)
|
| 484 |
+
|
| 485 |
+
return None
|
| 486 |
+
|
| 487 |
+
def generate_explanation(self, concept: str, student_level: str) -> str:
|
| 488 |
+
"""
|
| 489 |
+
Generate explanations for programming concepts based on student level
|
| 490 |
+
|
| 491 |
+
Args:
|
| 492 |
+
concept: The concept to explain
|
| 493 |
+
student_level: Student's skill level
|
| 494 |
+
|
| 495 |
+
Returns:
|
| 496 |
+
Explanation tailored to the student's level
|
| 497 |
+
"""
|
| 498 |
+
explanations = {
|
| 499 |
+
"variables": {
|
| 500 |
+
"beginner": "Variables are like labeled boxes where you store information. Think of 'name = \"John\"' as putting \"John\" in a box labeled 'name'.",
|
| 501 |
+
"intermediate": "Variables are memory locations that store data. They have a name, type, and value. Python is dynamically typed, so the type is inferred.",
|
| 502 |
+
"advanced": "Variables in Python are references to objects in memory. They're dynamically typed and use reference counting for memory management."
|
| 503 |
+
},
|
| 504 |
+
"loops": {
|
| 505 |
+
"beginner": "Loops repeat code multiple times. 'for' loops are great when you know how many times to repeat, 'while' loops when you don't.",
|
| 506 |
+
"intermediate": "Loops control program flow. 'for' iterates over sequences, 'while' continues until a condition is False. Consider time complexity.",
|
| 507 |
+
"advanced": "Loops are fundamental control structures. Python's 'for' is actually a foreach loop. Consider iterator patterns and generator expressions."
|
| 508 |
+
}
|
| 509 |
+
}
|
| 510 |
+
|
| 511 |
+
return explanations.get(concept, {}).get(student_level, f"Explanation for {concept} at {student_level} level")
|
| 512 |
+
|
| 513 |
+
def _load_comprehensive_feedback_prompt(self) -> str:
|
| 514 |
+
"""Load the comprehensive feedback prompt template"""
|
| 515 |
+
return """You are an expert programming tutor. Provide comprehensive educational feedback for the following student code.
|
| 516 |
+
|
| 517 |
+
Student Code:
|
| 518 |
+
{code}
|
| 519 |
+
|
| 520 |
+
Student Level: {level}
|
| 521 |
+
|
| 522 |
+
Please provide a detailed analysis in the following JSON format:
|
| 523 |
+
|
| 524 |
+
{{
|
| 525 |
+
"strengths": ["strength1", "strength2", "strength3"],
|
| 526 |
+
"weaknesses": ["weakness1", "weakness2", "weakness3"],
|
| 527 |
+
"issues": ["issue1", "issue2", "issue3"],
|
| 528 |
+
"step_by_step_improvement": [
|
| 529 |
+
"Step 1: Description of first improvement",
|
| 530 |
+
"Step 2: Description of second improvement",
|
| 531 |
+
"Step 3: Description of third improvement"
|
| 532 |
+
],
|
| 533 |
+
"learning_points": [
|
| 534 |
+
"Learning point 1: What the student should understand",
|
| 535 |
+
"Learning point 2: Key concept to grasp",
|
| 536 |
+
"Learning point 3: Best practice to follow"
|
| 537 |
+
],
|
| 538 |
+
"review_summary": "A comprehensive review of the code highlighting key areas for improvement",
|
| 539 |
+
"learning_objectives": ["objective1", "objective2", "objective3"],
|
| 540 |
+
"estimated_time_to_improve": "5-10 minutes"
|
| 541 |
+
}}
|
| 542 |
+
|
| 543 |
+
Focus on educational value and constructive feedback that helps the student learn and improve."""
|
| 544 |
+
|
| 545 |
+
def _load_comprehension_question_prompt(self) -> str:
|
| 546 |
+
"""Load the comprehension question generation prompt"""
|
| 547 |
+
return """Based on the learning points and improvements discussed, generate a comprehension question to test the student's understanding.
|
| 548 |
+
|
| 549 |
+
Learning Points: {learning_points}
|
| 550 |
+
Code Issues: {issues}
|
| 551 |
+
Student Level: {level}
|
| 552 |
+
|
| 553 |
+
Generate a question that tests understanding of the key concepts discussed. The question should be appropriate for the student's level.
|
| 554 |
+
|
| 555 |
+
Format your response as JSON:
|
| 556 |
+
{{
|
| 557 |
+
"question": "Your comprehension question here",
|
| 558 |
+
"answer": "The correct answer",
|
| 559 |
+
"explanation": "Detailed explanation of why this answer is correct"
|
| 560 |
+
}}
|
| 561 |
+
|
| 562 |
+
Make the question challenging but fair for the student's level."""
|
| 563 |
+
|
| 564 |
+
def _load_code_fix_prompt(self) -> str:
|
| 565 |
+
"""Load the code fix generation prompt"""
|
| 566 |
+
return """You are an expert programming tutor. Based on the analysis and learning points, provide an improved version of the student's code.
|
| 567 |
+
|
| 568 |
+
Original Code:
|
| 569 |
+
{code}
|
| 570 |
+
|
| 571 |
+
Issues Identified: {issues}
|
| 572 |
+
Learning Points: {learning_points}
|
| 573 |
+
Student Level: {level}
|
| 574 |
+
|
| 575 |
+
Provide an improved version of the code that addresses the issues while maintaining educational value. Include comments to explain the improvements.
|
| 576 |
+
|
| 577 |
+
Format your response as JSON:
|
| 578 |
+
{{
|
| 579 |
+
"improved_code": "The improved code with comments",
|
| 580 |
+
"fix_explanation": "Detailed explanation of what was changed and why"
|
| 581 |
+
}}
|
| 582 |
+
|
| 583 |
+
Focus on educational improvements that help the student understand better practices."""
|
| 584 |
+
|
| 585 |
+
def adapt_feedback_complexity(self, feedback: CodeFeedback, student_level: str) -> CodeFeedback:
|
| 586 |
+
"""
|
| 587 |
+
Adapt feedback complexity based on student level
|
| 588 |
+
|
| 589 |
+
Args:
|
| 590 |
+
feedback: Original feedback
|
| 591 |
+
student_level: Student's skill level
|
| 592 |
+
|
| 593 |
+
Returns:
|
| 594 |
+
Adapted feedback
|
| 595 |
+
"""
|
| 596 |
+
if student_level == "beginner":
|
| 597 |
+
# Simplify language and add more examples
|
| 598 |
+
feedback.feedback_message = feedback.feedback_message.replace(
|
| 599 |
+
"O(n²)", "quadratic time (slower)"
|
| 600 |
+
).replace(
|
| 601 |
+
"O(n)", "linear time (faster)"
|
| 602 |
+
)
|
| 603 |
+
elif student_level == "advanced":
|
| 604 |
+
# Add more technical details
|
| 605 |
+
if "optimization" in feedback.feedback_type:
|
| 606 |
+
feedback.feedback_message += " Consider the space-time tradeoff and cache locality."
|
| 607 |
+
|
| 608 |
+
return feedback
|
| 609 |
+
|
| 610 |
+
def generate_comprehensive_feedback(self, code: str, student_level: str = "beginner") -> ComprehensiveFeedback:
|
| 611 |
+
"""
|
| 612 |
+
Generate comprehensive educational feedback with all components
|
| 613 |
+
|
| 614 |
+
Args:
|
| 615 |
+
code: Student's code to analyze
|
| 616 |
+
student_level: Student's skill level
|
| 617 |
+
|
| 618 |
+
Returns:
|
| 619 |
+
ComprehensiveFeedback object with all educational components
|
| 620 |
+
"""
|
| 621 |
+
if not self.model or not self.tokenizer:
|
| 622 |
+
raise ValueError("Model not loaded. Call load_model() first.")
|
| 623 |
+
|
| 624 |
+
try:
|
| 625 |
+
# Step 1: Generate comprehensive analysis
|
| 626 |
+
comprehensive_analysis = self._generate_comprehensive_analysis(
|
| 627 |
+
code, student_level)
|
| 628 |
+
|
| 629 |
+
# Step 2: Generate comprehension question
|
| 630 |
+
comprehension_data = self._generate_comprehension_question(
|
| 631 |
+
comprehensive_analysis["learning_points"],
|
| 632 |
+
comprehensive_analysis["issues"],
|
| 633 |
+
student_level
|
| 634 |
+
)
|
| 635 |
+
|
| 636 |
+
# Step 3: Generate improved code
|
| 637 |
+
code_fix_data = self._generate_code_fix(
|
| 638 |
+
code,
|
| 639 |
+
comprehensive_analysis["issues"],
|
| 640 |
+
comprehensive_analysis["learning_points"],
|
| 641 |
+
student_level
|
| 642 |
+
)
|
| 643 |
+
|
| 644 |
+
# Create comprehensive feedback object
|
| 645 |
+
return ComprehensiveFeedback(
|
| 646 |
+
code_snippet=code,
|
| 647 |
+
student_level=student_level,
|
| 648 |
+
strengths=comprehensive_analysis["strengths"],
|
| 649 |
+
weaknesses=comprehensive_analysis["weaknesses"],
|
| 650 |
+
issues=comprehensive_analysis["issues"],
|
| 651 |
+
step_by_step_improvement=comprehensive_analysis["step_by_step_improvement"],
|
| 652 |
+
learning_points=comprehensive_analysis["learning_points"],
|
| 653 |
+
review_summary=comprehensive_analysis["review_summary"],
|
| 654 |
+
comprehension_question=comprehension_data["question"],
|
| 655 |
+
comprehension_answer=comprehension_data["answer"],
|
| 656 |
+
explanation=comprehension_data["explanation"],
|
| 657 |
+
improved_code=code_fix_data["improved_code"],
|
| 658 |
+
fix_explanation=code_fix_data["fix_explanation"],
|
| 659 |
+
difficulty_level=student_level,
|
| 660 |
+
learning_objectives=comprehensive_analysis["learning_objectives"],
|
| 661 |
+
estimated_time_to_improve=comprehensive_analysis["estimated_time_to_improve"]
|
| 662 |
+
)
|
| 663 |
+
|
| 664 |
+
except Exception as e:
|
| 665 |
+
logger.error(f"Error generating comprehensive feedback: {e}")
|
| 666 |
+
# Return a basic comprehensive feedback if model fails
|
| 667 |
+
return self._create_fallback_comprehensive_feedback(code, student_level)
|
| 668 |
+
|
| 669 |
+
def _generate_comprehensive_analysis(self, code: str, student_level: str) -> Dict:
|
| 670 |
+
"""Generate comprehensive analysis using the fine-tuned model"""
|
| 671 |
+
prompt = self.comprehensive_feedback_prompt.format(
|
| 672 |
+
code=code,
|
| 673 |
+
level=student_level
|
| 674 |
+
)
|
| 675 |
+
|
| 676 |
+
response = self._generate_model_response(prompt)
|
| 677 |
+
|
| 678 |
+
try:
|
| 679 |
+
# Try to parse JSON response
|
| 680 |
+
import json
|
| 681 |
+
return json.loads(response)
|
| 682 |
+
except json.JSONDecodeError:
|
| 683 |
+
logger.warning("Failed to parse JSON response, using fallback")
|
| 684 |
+
return self._create_fallback_analysis(code, student_level)
|
| 685 |
+
|
| 686 |
+
def _generate_comprehension_question(self, learning_points: List[str], issues: List[str], student_level: str) -> Dict:
|
| 687 |
+
"""Generate comprehension question using the fine-tuned model"""
|
| 688 |
+
prompt = self.comprehension_question_prompt.format(
|
| 689 |
+
learning_points=", ".join(learning_points),
|
| 690 |
+
issues=", ".join(issues),
|
| 691 |
+
level=student_level
|
| 692 |
+
)
|
| 693 |
+
|
| 694 |
+
response = self._generate_model_response(prompt)
|
| 695 |
+
|
| 696 |
+
try:
|
| 697 |
+
import json
|
| 698 |
+
return json.loads(response)
|
| 699 |
+
except json.JSONDecodeError:
|
| 700 |
+
logger.warning(
|
| 701 |
+
"Failed to parse comprehension question JSON, using fallback")
|
| 702 |
+
return {
|
| 703 |
+
"question": "What is the main concept you learned from this code review?",
|
| 704 |
+
"answer": "The main concept is understanding code structure and best practices.",
|
| 705 |
+
"explanation": "This question tests your understanding of the key learning points discussed."
|
| 706 |
+
}
|
| 707 |
+
|
| 708 |
+
def _generate_code_fix(self, code: str, issues: List[str], learning_points: List[str], student_level: str) -> Dict:
|
| 709 |
+
"""Generate improved code using the fine-tuned model"""
|
| 710 |
+
prompt = self.code_fix_prompt.format(
|
| 711 |
+
code=code,
|
| 712 |
+
issues=", ".join(issues),
|
| 713 |
+
learning_points=", ".join(learning_points),
|
| 714 |
+
level=student_level
|
| 715 |
+
)
|
| 716 |
+
|
| 717 |
+
response = self._generate_model_response(prompt)
|
| 718 |
+
|
| 719 |
+
try:
|
| 720 |
+
import json
|
| 721 |
+
return json.loads(response)
|
| 722 |
+
except json.JSONDecodeError:
|
| 723 |
+
logger.warning("Failed to parse code fix JSON, using fallback")
|
| 724 |
+
return {
|
| 725 |
+
"improved_code": "# Improved version of your code\n# Add comments and improvements here",
|
| 726 |
+
"fix_explanation": "This is a fallback improved version. The model should provide specific improvements."
|
| 727 |
+
}
|
| 728 |
+
|
| 729 |
+
def _generate_model_response(self, prompt: str) -> str:
|
| 730 |
+
"""Generate response from the fine-tuned model"""
|
| 731 |
+
inputs = self.tokenizer(
|
| 732 |
+
prompt, return_tensors="pt", truncation=True, max_length=2048)
|
| 733 |
+
|
| 734 |
+
# Move to CPU if no GPU available
|
| 735 |
+
if not torch.cuda.is_available():
|
| 736 |
+
inputs = {k: v.cpu() for k, v in inputs.items()}
|
| 737 |
+
|
| 738 |
+
with torch.no_grad():
|
| 739 |
+
outputs = self.model.generate(
|
| 740 |
+
inputs.input_ids,
|
| 741 |
+
max_new_tokens=512,
|
| 742 |
+
temperature=0.7,
|
| 743 |
+
do_sample=True,
|
| 744 |
+
pad_token_id=self.tokenizer.eos_token_id
|
| 745 |
+
)
|
| 746 |
+
|
| 747 |
+
response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
|
| 748 |
+
return response[len(prompt):].strip()
|
| 749 |
+
|
| 750 |
+
def _create_fallback_analysis(self, code: str, student_level: str) -> Dict:
|
| 751 |
+
"""Create fallback analysis when model fails"""
|
| 752 |
+
return {
|
| 753 |
+
"strengths": ["Your code has a clear structure", "You're using appropriate data types"],
|
| 754 |
+
"weaknesses": ["Could improve variable naming", "Consider adding comments"],
|
| 755 |
+
"issues": ["Basic syntax and style issues"],
|
| 756 |
+
"step_by_step_improvement": [
|
| 757 |
+
"Step 1: Add descriptive variable names",
|
| 758 |
+
"Step 2: Include comments explaining your logic",
|
| 759 |
+
"Step 3: Consider code optimization"
|
| 760 |
+
],
|
| 761 |
+
"learning_points": [
|
| 762 |
+
"Good variable naming improves code readability",
|
| 763 |
+
"Comments help others understand your code",
|
| 764 |
+
"Always consider efficiency in your solutions"
|
| 765 |
+
],
|
| 766 |
+
"review_summary": "Your code works but could be improved with better practices.",
|
| 767 |
+
"learning_objectives": ["code_quality", "best_practices", "readability"],
|
| 768 |
+
"estimated_time_to_improve": "10-15 minutes"
|
| 769 |
+
}
|
| 770 |
+
|
| 771 |
+
def _create_fallback_comprehensive_feedback(self, code: str, student_level: str) -> ComprehensiveFeedback:
|
| 772 |
+
"""Create fallback comprehensive feedback when model fails"""
|
| 773 |
+
fallback_analysis = self._create_fallback_analysis(code, student_level)
|
| 774 |
+
|
| 775 |
+
return ComprehensiveFeedback(
|
| 776 |
+
code_snippet=code,
|
| 777 |
+
student_level=student_level,
|
| 778 |
+
strengths=fallback_analysis["strengths"],
|
| 779 |
+
weaknesses=fallback_analysis["weaknesses"],
|
| 780 |
+
issues=fallback_analysis["issues"],
|
| 781 |
+
step_by_step_improvement=fallback_analysis["step_by_step_improvement"],
|
| 782 |
+
learning_points=fallback_analysis["learning_points"],
|
| 783 |
+
review_summary=fallback_analysis["review_summary"],
|
| 784 |
+
comprehension_question="What is the importance of good variable naming in programming?",
|
| 785 |
+
comprehension_answer="Good variable naming makes code more readable and maintainable.",
|
| 786 |
+
explanation="Descriptive variable names help other developers (and yourself) understand what the code does.",
|
| 787 |
+
improved_code="# Improved version\n# Add your improvements here",
|
| 788 |
+
fix_explanation="This is a fallback version. The model should provide specific improvements.",
|
| 789 |
+
difficulty_level=student_level,
|
| 790 |
+
learning_objectives=fallback_analysis["learning_objectives"],
|
| 791 |
+
estimated_time_to_improve=fallback_analysis["estimated_time_to_improve"]
|
| 792 |
+
)
|
| 793 |
+
|
| 794 |
+
|
| 795 |
+
def main():
|
| 796 |
+
"""Main function to demonstrate the system with fine-tuned model"""
|
| 797 |
+
print("Generative AI for Programming Education")
|
| 798 |
+
print("Using Fine-tuned CodeLlama-7B Model")
|
| 799 |
+
print("=" * 50)
|
| 800 |
+
|
| 801 |
+
# System information
|
| 802 |
+
print(f"Available GPUs: {torch.cuda.device_count()}")
|
| 803 |
+
if torch.cuda.is_available():
|
| 804 |
+
print("GPU Memory before loading:")
|
| 805 |
+
get_gpu_memory()
|
| 806 |
+
else:
|
| 807 |
+
print("System Memory before loading:")
|
| 808 |
+
get_system_memory()
|
| 809 |
+
|
| 810 |
+
# Initialize the system with your fine-tuned model path
|
| 811 |
+
# Update this path to point to your actual fine-tuned model
|
| 812 |
+
model_path = r"C:\Users\farou\OneDrive - Aston University\finetunning"
|
| 813 |
+
ai_tutor = ProgrammingEducationAI(model_path)
|
| 814 |
+
|
| 815 |
+
try:
|
| 816 |
+
# Load the fine-tuned model
|
| 817 |
+
print("Loading fine-tuned model...")
|
| 818 |
+
ai_tutor.load_model()
|
| 819 |
+
print("✓ Model loaded successfully!")
|
| 820 |
+
|
| 821 |
+
# Clear cache after loading
|
| 822 |
+
clear_cuda_cache()
|
| 823 |
+
if torch.cuda.is_available():
|
| 824 |
+
print("GPU Memory after loading:")
|
| 825 |
+
get_gpu_memory()
|
| 826 |
+
else:
|
| 827 |
+
print("System Memory after loading:")
|
| 828 |
+
get_system_memory()
|
| 829 |
+
|
| 830 |
+
# Example student code for testing
|
| 831 |
+
student_code = """
|
| 832 |
+
def find_duplicates(numbers):
|
| 833 |
+
x = []
|
| 834 |
+
for i in range(len(numbers)):
|
| 835 |
+
for j in range(i+1, len(numbers)):
|
| 836 |
+
if numbers[i] == numbers[j]:
|
| 837 |
+
x.append(numbers[i])
|
| 838 |
+
return x
|
| 839 |
+
|
| 840 |
+
# Test the function
|
| 841 |
+
result = find_duplicates([1, 2, 3, 2, 4, 5, 3])
|
| 842 |
+
print(result)
|
| 843 |
+
"""
|
| 844 |
+
|
| 845 |
+
print(f"\nAnalyzing student code:\n{student_code}")
|
| 846 |
+
|
| 847 |
+
# Get feedback using fine-tuned model
|
| 848 |
+
feedback_list = ai_tutor.analyze_student_code(student_code, "beginner")
|
| 849 |
+
|
| 850 |
+
print("\n" + "="*50)
|
| 851 |
+
print("FINE-TUNED MODEL FEEDBACK:")
|
| 852 |
+
print("="*50)
|
| 853 |
+
|
| 854 |
+
for i, feedback in enumerate(feedback_list, 1):
|
| 855 |
+
print(f"\n{i}. {feedback.feedback_type.upper()}:")
|
| 856 |
+
print(f" {feedback.feedback_message}")
|
| 857 |
+
if feedback.suggested_improvement:
|
| 858 |
+
print(f" Suggestion: {feedback.suggested_improvement}")
|
| 859 |
+
print(
|
| 860 |
+
f" Learning objectives: {', '.join(feedback.learning_objectives)}")
|
| 861 |
+
|
| 862 |
+
# Demonstrate direct model calls
|
| 863 |
+
print("\n" + "="*50)
|
| 864 |
+
print("DIRECT MODEL GENERATION:")
|
| 865 |
+
print("="*50)
|
| 866 |
+
|
| 867 |
+
# Code review
|
| 868 |
+
print("\n1. CODE REVIEW:")
|
| 869 |
+
code_review = ai_tutor.generate_code_review(student_code, "beginner")
|
| 870 |
+
print(code_review)
|
| 871 |
+
|
| 872 |
+
# Educational feedback
|
| 873 |
+
print("\n2. EDUCATIONAL FEEDBACK:")
|
| 874 |
+
educational_feedback = ai_tutor.generate_educational_feedback(
|
| 875 |
+
student_code, "beginner")
|
| 876 |
+
print(educational_feedback)
|
| 877 |
+
|
| 878 |
+
# Demonstrate comprehensive feedback system
|
| 879 |
+
print("\n" + "="*50)
|
| 880 |
+
print("COMPREHENSIVE EDUCATIONAL FEEDBACK SYSTEM:")
|
| 881 |
+
print("="*50)
|
| 882 |
+
|
| 883 |
+
comprehensive_feedback = ai_tutor.generate_comprehensive_feedback(
|
| 884 |
+
student_code, "beginner")
|
| 885 |
+
|
| 886 |
+
# Display comprehensive feedback
|
| 887 |
+
print("\n📊 CODE ANALYSIS:")
|
| 888 |
+
print("="*30)
|
| 889 |
+
|
| 890 |
+
print("\n✅ STRENGTHS:")
|
| 891 |
+
for i, strength in enumerate(comprehensive_feedback.strengths, 1):
|
| 892 |
+
print(f" {i}. {strength}")
|
| 893 |
+
|
| 894 |
+
print("\n❌ WEAKNESSES:")
|
| 895 |
+
for i, weakness in enumerate(comprehensive_feedback.weaknesses, 1):
|
| 896 |
+
print(f" {i}. {weakness}")
|
| 897 |
+
|
| 898 |
+
print("\n⚠️ ISSUES:")
|
| 899 |
+
for i, issue in enumerate(comprehensive_feedback.issues, 1):
|
| 900 |
+
print(f" {i}. {issue}")
|
| 901 |
+
|
| 902 |
+
print("\n📝 STEP-BY-STEP IMPROVEMENT GUIDE:")
|
| 903 |
+
print("="*40)
|
| 904 |
+
for i, step in enumerate(comprehensive_feedback.step_by_step_improvement, 1):
|
| 905 |
+
print(f" Step {i}: {step}")
|
| 906 |
+
|
| 907 |
+
print("\n🎓 LEARNING POINTS:")
|
| 908 |
+
print("="*25)
|
| 909 |
+
for i, point in enumerate(comprehensive_feedback.learning_points, 1):
|
| 910 |
+
print(f" {i}. {point}")
|
| 911 |
+
|
| 912 |
+
print("\n📋 REVIEW SUMMARY:")
|
| 913 |
+
print("="*20)
|
| 914 |
+
print(f" {comprehensive_feedback.review_summary}")
|
| 915 |
+
|
| 916 |
+
print("\n❓ COMPREHENSION QUESTION:")
|
| 917 |
+
print("="*30)
|
| 918 |
+
print(f" Question: {comprehensive_feedback.comprehension_question}")
|
| 919 |
+
print(f" Answer: {comprehensive_feedback.comprehension_answer}")
|
| 920 |
+
print(f" Explanation: {comprehensive_feedback.explanation}")
|
| 921 |
+
|
| 922 |
+
print("\n🔧 IMPROVED CODE:")
|
| 923 |
+
print("="*20)
|
| 924 |
+
print(comprehensive_feedback.improved_code)
|
| 925 |
+
|
| 926 |
+
print("\n💡 FIX EXPLANATION:")
|
| 927 |
+
print("="*20)
|
| 928 |
+
print(f" {comprehensive_feedback.fix_explanation}")
|
| 929 |
+
|
| 930 |
+
print("\n📊 METADATA:")
|
| 931 |
+
print("="*15)
|
| 932 |
+
print(f" Student Level: {comprehensive_feedback.student_level}")
|
| 933 |
+
print(
|
| 934 |
+
f" Learning Objectives: {', '.join(comprehensive_feedback.learning_objectives)}")
|
| 935 |
+
print(
|
| 936 |
+
f" Estimated Time to Improve: {comprehensive_feedback.estimated_time_to_improve}")
|
| 937 |
+
|
| 938 |
+
except Exception as e:
|
| 939 |
+
print(f"Error: {e}")
|
| 940 |
+
print(
|
| 941 |
+
"Make sure to update the model_path variable to point to your fine-tuned model.")
|
| 942 |
+
|
| 943 |
+
|
| 944 |
+
if __name__ == "__main__":
|
| 945 |
+
main()
|
src/requirements.txt
ADDED
|
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Core ML/AI dependencies
|
| 2 |
+
torch>=2.0.0
|
| 3 |
+
transformers>=4.30.0
|
| 4 |
+
accelerate>=0.20.0
|
| 5 |
+
bitsandbytes>=0.41.0
|
| 6 |
+
|
| 7 |
+
# Data processing
|
| 8 |
+
numpy>=1.24.0
|
| 9 |
+
pandas>=2.0.0
|
| 10 |
+
datasets>=2.12.0
|
| 11 |
+
|
| 12 |
+
# Utilities
|
| 13 |
+
tqdm>=4.65.0
|
| 14 |
+
requests>=2.31.0
|
| 15 |
+
python-dotenv>=1.0.0
|
| 16 |
+
psutil>=5.9.0 # For system memory monitoring
|
| 17 |
+
|
| 18 |
+
# Logging and monitoring
|
| 19 |
+
wandb>=0.15.0 # Optional: for experiment tracking
|
| 20 |
+
tensorboard>=2.13.0 # Optional: for training monitoring
|
| 21 |
+
|
| 22 |
+
# Code analysis (optional enhancements)
|
| 23 |
+
ast>=0.0.2
|
| 24 |
+
black>=23.0.0 # For code formatting analysis
|
| 25 |
+
pylint>=2.17.0 # For code quality analysis
|
| 26 |
+
|
| 27 |
+
# Web interface (optional)
|
| 28 |
+
flask>=2.3.0
|
| 29 |
+
streamlit>=1.25.0 # For creating a web interface
|
| 30 |
+
|
| 31 |
+
# Testing
|
| 32 |
+
pytest>=7.4.0
|
| 33 |
+
pytest-cov>=4.1.0
|
| 34 |
+
|
| 35 |
+
# Development
|
| 36 |
+
jupyter>=1.0.0
|
| 37 |
+
ipython>=8.14.0
|