File size: 6,680 Bytes
5fffedf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
# πŸŽ“ Generative AI for Programming Education

## πŸš€ Live Demo
**Hugging Face Spaces**: [Coming Soon - Deploy using DEPLOYMENT.md guide]

## πŸ“‹ Problem Statement
Current programming education struggles with high dropout rates, inefficient feedback loops, and a lack of personalized learningβ€”problems exacerbated by limited instructor bandwidth. While Generative AI (e.g., Copilot, ChatGPT) can help, most tools prioritize productivity over learning, offering code solutions without explanations or tailored guidance. This risks student over-reliance without deeper comprehension.

## 🎯 Solution
To address this gap, we fine-tuned **CodeLlama-7B** to provide structured, educational code feedbackβ€”not just correct answers. Our model analyzes student code and delivers:

- **Instant, actionable reviews** (e.g., "This loop can be optimized from O(nΒ²) to O(n) using a hashmap")
- **Beginner-friendly explanations** (e.g., "In Python, list.append() modifies the list in-place but returns Noneβ€”that's why your print() shows None")
- **Personalized adaptation** (e.g., adjusting feedback depth based on inferred skill level)

Unlike generic AI tools, our system is explicitly designed for education, balancing correctness, pedagogy, and ethical safeguards against over-reliance.

## ✨ Features

### 🧠 **Fine-tuned CodeLlama-7B Model**
- Trained on **code review** and **code feedback** datasets
- **7B parameters** for comprehensive understanding
- **Educational focus** rather than productivity optimization

### πŸ“Š **Progressive Learning Interface**
- **5-stage educational process**:
  1. **Code Analysis** - Strengths, weaknesses, issues
  2. **Improvement Guide** - Step-by-step instructions
  3. **Learning Points** - Key concepts and objectives
  4. **Comprehension Quiz** - Test understanding
  5. **Code Fix** - Improved solution (only after learning)

### πŸŽ“ **Educational Features**
- **Student Level Adaptation** (Beginner/Intermediate/Advanced)
- **Comprehension Questions** generated by the model
- **Learning Objectives** for each feedback
- **Step-by-step improvement guides**
- **Algorithm complexity explanations**

### πŸ›‘οΈ **Ethical Safeguards**
- **Progressive learning flow** prevents solution jumping
- **Comprehension testing** before showing fixes
- **Educational explanations** rather than quick answers
- **Best practices promotion**

## πŸš€ **Hugging Face Spaces Deployment**

### **Hardware Specifications**
- **CPU**: 2 vCPU (virtual CPU cores)
- **RAM**: 16 GB
- **Plan**: FREE tier
- **Storage**: Sufficient for model and application

### **Optimization Features**
- βœ… **16GB RAM optimization** for fine-tuned model
- βœ… **CPU-only inference** (no GPU required)
- βœ… **Memory management** with gradient checkpointing
- βœ… **Demo mode** for immediate testing
- βœ… **Progressive loading** with fallback options

### **Performance Expectations**
- **Demo Mode**: Instant response
- **Fine-tuned Model**: 5-10 minutes initial loading
- **Memory Usage**: Optimized for 16GB constraint
- **Concurrent Users**: Limited by CPU cores

## πŸ› οΈ Installation & Setup

### **Local Development**
```bash

# Clone the repository

git clone https://github.com/TomoriFarouk/GenAI-For-Programming-Language.git

cd GenAI-For-Programming-Language



# Install dependencies

pip install -r requirements.txt



# Run the application

streamlit run app.py

```

### **Hugging Face Spaces Deployment**
Follow the detailed guide in `DEPLOYMENT.md` for step-by-step instructions.

## πŸ“ Project Structure

```

GenAI-For-Programming-Language/

β”œβ”€β”€ app.py                    # Main Streamlit interface (HF Spaces optimized)

β”œβ”€β”€ fine.py                   # Fine-tuned model integration

β”œβ”€β”€ config.py                 # Configuration settings

β”œβ”€β”€ requirements.txt          # Dependencies

β”œβ”€β”€ README.md                 # This file

β”œβ”€β”€ DEPLOYMENT.md            # HF Spaces deployment guide

β”œβ”€β”€ .gitignore               # Excludes model files

β”œβ”€β”€ .gitattributes           # File type configuration

└── example_usage.py         # Usage examples

```

## 🧠 Model Architecture

### **Base Model**
- **CodeLlama-7B-Instruct-hf**
- **7 billion parameters**
- **Code-specific training**

### **Fine-tuning Datasets**
1. **Code Review Dataset**: Structured feedback on code quality
2. **Code Feedback Dataset**: Educational explanations and improvements

### **Training Process**
- **LoRA fine-tuning** for efficiency
- **Educational prompt engineering**
- **Multi-stage feedback generation**

## 🎯 Usage Examples

### **Input Code**
```python

def find_duplicates(numbers):

    x = []

    for i in range(len(numbers)):

        for j in range(i+1, len(numbers)):

            if numbers[i] == numbers[j]:

                x.append(numbers[i])

    return x

```

### **Generated Feedback**
1. **Analysis**: Identifies O(nΒ²) complexity, poor variable naming
2. **Improvement Guide**: Step-by-step optimization instructions
3. **Learning Points**: Algorithm complexity, naming conventions
4. **Quiz**: "What is the time complexity and how to improve it?"
5. **Code Fix**: Optimized O(n) solution with better naming

## πŸ”§ Configuration

### **Model Settings**
- **Path**: `./model` (for HF Spaces)
- **Device**: CPU-optimized for 16GB RAM
- **Memory**: Gradient checkpointing enabled

### **Educational Settings**
- **Student Levels**: Beginner, Intermediate, Advanced
- **Feedback Types**: Syntax, Logic, Optimization, Style
- **Learning Objectives**: Comprehensive programming concepts

## πŸš€ Performance

### **Local Environment**
- **GPU**: Recommended for faster inference
- **RAM**: 16GB+ recommended
- **Storage**: 30GB+ for model files

### **Hugging Face Spaces**
- **CPU**: 2 vCPU (sufficient for inference)
- **RAM**: 16GB (optimized for this constraint)
- **Loading Time**: 5-10 minutes for fine-tuned model
- **Demo Mode**: Instant response

## 🀝 Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Test thoroughly
5. Submit a pull request

## πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

## πŸ™ Acknowledgments

- **CodeLlama team** for the base model
- **Hugging Face** for the Spaces platform
- **Streamlit** for the web interface framework

## πŸ“ž Contact

For questions or support, please open an issue on GitHub.

---

**πŸŽ“ Empowering programming education through AI-driven, structured learning experiences.**