Spaces:
Sleeping
Sleeping
File size: 6,680 Bytes
5fffedf |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 |
# π Generative AI for Programming Education
## π Live Demo
**Hugging Face Spaces**: [Coming Soon - Deploy using DEPLOYMENT.md guide]
## π Problem Statement
Current programming education struggles with high dropout rates, inefficient feedback loops, and a lack of personalized learningβproblems exacerbated by limited instructor bandwidth. While Generative AI (e.g., Copilot, ChatGPT) can help, most tools prioritize productivity over learning, offering code solutions without explanations or tailored guidance. This risks student over-reliance without deeper comprehension.
## π― Solution
To address this gap, we fine-tuned **CodeLlama-7B** to provide structured, educational code feedbackβnot just correct answers. Our model analyzes student code and delivers:
- **Instant, actionable reviews** (e.g., "This loop can be optimized from O(nΒ²) to O(n) using a hashmap")
- **Beginner-friendly explanations** (e.g., "In Python, list.append() modifies the list in-place but returns Noneβthat's why your print() shows None")
- **Personalized adaptation** (e.g., adjusting feedback depth based on inferred skill level)
Unlike generic AI tools, our system is explicitly designed for education, balancing correctness, pedagogy, and ethical safeguards against over-reliance.
## β¨ Features
### π§ **Fine-tuned CodeLlama-7B Model**
- Trained on **code review** and **code feedback** datasets
- **7B parameters** for comprehensive understanding
- **Educational focus** rather than productivity optimization
### π **Progressive Learning Interface**
- **5-stage educational process**:
1. **Code Analysis** - Strengths, weaknesses, issues
2. **Improvement Guide** - Step-by-step instructions
3. **Learning Points** - Key concepts and objectives
4. **Comprehension Quiz** - Test understanding
5. **Code Fix** - Improved solution (only after learning)
### π **Educational Features**
- **Student Level Adaptation** (Beginner/Intermediate/Advanced)
- **Comprehension Questions** generated by the model
- **Learning Objectives** for each feedback
- **Step-by-step improvement guides**
- **Algorithm complexity explanations**
### π‘οΈ **Ethical Safeguards**
- **Progressive learning flow** prevents solution jumping
- **Comprehension testing** before showing fixes
- **Educational explanations** rather than quick answers
- **Best practices promotion**
## π **Hugging Face Spaces Deployment**
### **Hardware Specifications**
- **CPU**: 2 vCPU (virtual CPU cores)
- **RAM**: 16 GB
- **Plan**: FREE tier
- **Storage**: Sufficient for model and application
### **Optimization Features**
- β
**16GB RAM optimization** for fine-tuned model
- β
**CPU-only inference** (no GPU required)
- β
**Memory management** with gradient checkpointing
- β
**Demo mode** for immediate testing
- β
**Progressive loading** with fallback options
### **Performance Expectations**
- **Demo Mode**: Instant response
- **Fine-tuned Model**: 5-10 minutes initial loading
- **Memory Usage**: Optimized for 16GB constraint
- **Concurrent Users**: Limited by CPU cores
## π οΈ Installation & Setup
### **Local Development**
```bash
# Clone the repository
git clone https://github.com/TomoriFarouk/GenAI-For-Programming-Language.git
cd GenAI-For-Programming-Language
# Install dependencies
pip install -r requirements.txt
# Run the application
streamlit run app.py
```
### **Hugging Face Spaces Deployment**
Follow the detailed guide in `DEPLOYMENT.md` for step-by-step instructions.
## π Project Structure
```
GenAI-For-Programming-Language/
βββ app.py # Main Streamlit interface (HF Spaces optimized)
βββ fine.py # Fine-tuned model integration
βββ config.py # Configuration settings
βββ requirements.txt # Dependencies
βββ README.md # This file
βββ DEPLOYMENT.md # HF Spaces deployment guide
βββ .gitignore # Excludes model files
βββ .gitattributes # File type configuration
βββ example_usage.py # Usage examples
```
## π§ Model Architecture
### **Base Model**
- **CodeLlama-7B-Instruct-hf**
- **7 billion parameters**
- **Code-specific training**
### **Fine-tuning Datasets**
1. **Code Review Dataset**: Structured feedback on code quality
2. **Code Feedback Dataset**: Educational explanations and improvements
### **Training Process**
- **LoRA fine-tuning** for efficiency
- **Educational prompt engineering**
- **Multi-stage feedback generation**
## π― Usage Examples
### **Input Code**
```python
def find_duplicates(numbers):
x = []
for i in range(len(numbers)):
for j in range(i+1, len(numbers)):
if numbers[i] == numbers[j]:
x.append(numbers[i])
return x
```
### **Generated Feedback**
1. **Analysis**: Identifies O(nΒ²) complexity, poor variable naming
2. **Improvement Guide**: Step-by-step optimization instructions
3. **Learning Points**: Algorithm complexity, naming conventions
4. **Quiz**: "What is the time complexity and how to improve it?"
5. **Code Fix**: Optimized O(n) solution with better naming
## π§ Configuration
### **Model Settings**
- **Path**: `./model` (for HF Spaces)
- **Device**: CPU-optimized for 16GB RAM
- **Memory**: Gradient checkpointing enabled
### **Educational Settings**
- **Student Levels**: Beginner, Intermediate, Advanced
- **Feedback Types**: Syntax, Logic, Optimization, Style
- **Learning Objectives**: Comprehensive programming concepts
## π Performance
### **Local Environment**
- **GPU**: Recommended for faster inference
- **RAM**: 16GB+ recommended
- **Storage**: 30GB+ for model files
### **Hugging Face Spaces**
- **CPU**: 2 vCPU (sufficient for inference)
- **RAM**: 16GB (optimized for this constraint)
- **Loading Time**: 5-10 minutes for fine-tuned model
- **Demo Mode**: Instant response
## π€ Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Test thoroughly
5. Submit a pull request
## π License
This project is licensed under the MIT License - see the LICENSE file for details.
## π Acknowledgments
- **CodeLlama team** for the base model
- **Hugging Face** for the Spaces platform
- **Streamlit** for the web interface framework
## π Contact
For questions or support, please open an issue on GitHub.
---
**π Empowering programming education through AI-driven, structured learning experiences.** |