Spaces:

FaroukTomori
/

ai-programming-tutor

Sleeping

App Files Files Community

ai-programming-tutor / src /README.md

FaroukTomori

Upload 7 files

5fffedf verified 7 months ago

preview code

raw

history blame contribute delete

6.68 kB

🎓 Generative AI for Programming Education

🚀 Live Demo

Hugging Face Spaces: [Coming Soon - Deploy using DEPLOYMENT.md guide]

📋 Problem Statement

Current programming education struggles with high dropout rates, inefficient feedback loops, and a lack of personalized learning—problems exacerbated by limited instructor bandwidth. While Generative AI (e.g., Copilot, ChatGPT) can help, most tools prioritize productivity over learning, offering code solutions without explanations or tailored guidance. This risks student over-reliance without deeper comprehension.

🎯 Solution

To address this gap, we fine-tuned CodeLlama-7B to provide structured, educational code feedback—not just correct answers. Our model analyzes student code and delivers:

Instant, actionable reviews (e.g., "This loop can be optimized from O(n²) to O(n) using a hashmap")
Beginner-friendly explanations (e.g., "In Python, list.append() modifies the list in-place but returns None—that's why your print() shows None")
Personalized adaptation (e.g., adjusting feedback depth based on inferred skill level)

Unlike generic AI tools, our system is explicitly designed for education, balancing correctness, pedagogy, and ethical safeguards against over-reliance.

✨ Features

🧠 Fine-tuned CodeLlama-7B Model

Trained on code review and code feedback datasets
7B parameters for comprehensive understanding
Educational focus rather than productivity optimization

📊 Progressive Learning Interface

5-stage educational process:
1. Code Analysis - Strengths, weaknesses, issues
2. Improvement Guide - Step-by-step instructions
3. Learning Points - Key concepts and objectives
4. Comprehension Quiz - Test understanding
5. Code Fix - Improved solution (only after learning)

🎓 Educational Features

Student Level Adaptation (Beginner/Intermediate/Advanced)
Comprehension Questions generated by the model
Learning Objectives for each feedback
Step-by-step improvement guides
Algorithm complexity explanations

🛡️ Ethical Safeguards

Progressive learning flow prevents solution jumping
Comprehension testing before showing fixes
Educational explanations rather than quick answers
Best practices promotion

🚀 Hugging Face Spaces Deployment

Hardware Specifications

CPU: 2 vCPU (virtual CPU cores)
RAM: 16 GB
Plan: FREE tier
Storage: Sufficient for model and application

Optimization Features

✅ 16GB RAM optimization for fine-tuned model
✅ CPU-only inference (no GPU required)
✅ Memory management with gradient checkpointing
✅ Demo mode for immediate testing
✅ Progressive loading with fallback options

Performance Expectations

Demo Mode: Instant response
Fine-tuned Model: 5-10 minutes initial loading
Memory Usage: Optimized for 16GB constraint
Concurrent Users: Limited by CPU cores

🛠️ Installation & Setup

Local Development

# Clone the repository
git clone https://github.com/TomoriFarouk/GenAI-For-Programming-Language.git
cd GenAI-For-Programming-Language

# Install dependencies
pip install -r requirements.txt

# Run the application
streamlit run app.py

Hugging Face Spaces Deployment

Follow the detailed guide in DEPLOYMENT.md for step-by-step instructions.

📁 Project Structure

GenAI-For-Programming-Language/
├── app.py                    # Main Streamlit interface (HF Spaces optimized)
├── fine.py                   # Fine-tuned model integration
├── config.py                 # Configuration settings
├── requirements.txt          # Dependencies
├── README.md                 # This file
├── DEPLOYMENT.md            # HF Spaces deployment guide
├── .gitignore               # Excludes model files
├── .gitattributes           # File type configuration
└── example_usage.py         # Usage examples

🧠 Model Architecture

Base Model

CodeLlama-7B-Instruct-hf
7 billion parameters
Code-specific training

Fine-tuning Datasets

Code Review Dataset: Structured feedback on code quality
Code Feedback Dataset: Educational explanations and improvements

Training Process

LoRA fine-tuning for efficiency
Educational prompt engineering
Multi-stage feedback generation

🎯 Usage Examples

Input Code

def find_duplicates(numbers):
    x = []
    for i in range(len(numbers)):
        for j in range(i+1, len(numbers)):
            if numbers[i] == numbers[j]:
                x.append(numbers[i])
    return x

Generated Feedback

Analysis: Identifies O(n²) complexity, poor variable naming
Improvement Guide: Step-by-step optimization instructions
Learning Points: Algorithm complexity, naming conventions
Quiz: "What is the time complexity and how to improve it?"
Code Fix: Optimized O(n) solution with better naming

🔧 Configuration

Model Settings

Path: ./model (for HF Spaces)
Device: CPU-optimized for 16GB RAM
Memory: Gradient checkpointing enabled

Educational Settings

Student Levels: Beginner, Intermediate, Advanced
Feedback Types: Syntax, Logic, Optimization, Style
Learning Objectives: Comprehensive programming concepts

🚀 Performance

Local Environment

GPU: Recommended for faster inference
RAM: 16GB+ recommended
Storage: 30GB+ for model files

Hugging Face Spaces

CPU: 2 vCPU (sufficient for inference)
RAM: 16GB (optimized for this constraint)
Loading Time: 5-10 minutes for fine-tuned model
Demo Mode: Instant response

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Test thoroughly
Submit a pull request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

CodeLlama team for the base model
Hugging Face for the Spaces platform
Streamlit for the web interface framework

📞 Contact

For questions or support, please open an issue on GitHub.

🎓 Empowering programming education through AI-driven, structured learning experiences.