Spaces:
Sleeping
title: Student Performance Predictor
emoji: π
colorFrom: blue
colorTo: purple
sdk: docker
sdk_version: latest
app_file: app.py
pinned: false
license: mit
π Student Performance Prediction System
A comprehensive machine learning solution that predicts student math performance based on demographic and academic factors using ensemble learning techniques.
π Table of Contents
- Overview
- Problem Statement
- Dataset
- Architecture
- Features
- Installation
- Usage
- Model Performance
- API Endpoints
- Project Structure
- Technologies Used
- Contributing
- License
- Author
π― Overview
This project implements an end-to-end machine learning pipeline to predict student mathematics performance based on various socio-economic and educational factors. The system uses advanced ensemble learning algorithms and provides a user-friendly web interface for real-time predictions.
π Problem Statement
Understanding how student performance in mathematics is influenced by various factors such as:
- Demographic factors: Gender, Race/Ethnicity
- Socio-economic factors: Lunch type (indicator of economic status)
- Educational background: Parental education level, Test preparation course completion
- Academic performance: Reading and Writing scores
The goal is to build a robust prediction model that can help educators and institutions identify students who might need additional support.
π Dataset
Source: Kaggle - Students Performance in Exams
Dataset Characteristics:
- Size: 1,000 student records
- Features: 8 columns (5 categorical, 3 numerical)
- Target Variable:
math_score(0-100)
Feature Description:
| Feature | Type | Description |
|---|---|---|
gender |
Categorical | Student's gender (male/female) |
race_ethnicity |
Categorical | Student's ethnic group (A, B, C, D, E) |
parental_level_of_education |
Categorical | Highest education level of parents |
lunch |
Categorical | Lunch type (standard/free or reduced) |
test_preparation_course |
Categorical | Test prep course completion status |
reading_score |
Numerical | Reading test score (0-100) |
writing_score |
Numerical | Writing test score (0-100) |
math_score |
Numerical | Target - Mathematics test score (0-100) |
ποΈ Architecture
The project follows a modular, production-ready architecture:
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Data Source βββββΆβ Data Ingestion βββββΆβ Data Transform β
β (CSV File) β β Component β β Component β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β
βββββββββββββββββββ ββββββββββββββββββββ β
β Web Interface ββββββ Flask App β β
β (HTML/CSS) β β (Prediction) β β
βββββββββββββββββββ ββββββββββββββββββββ β
β² β
β βΌ
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Artifacts ββββββ Model Trainer ββββββ Preprocessed β
β (model.pkl, β β Component β β Data β
β preprocessor.pkl)β β β β β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β¨ Features
π€ Machine Learning Pipeline
- Data Ingestion: Automated data loading and train-test splitting
- Data Transformation:
- Numerical features: Median imputation + Standard scaling
- Categorical features: Mode imputation + One-hot encoding + Scaling
- Model Training: Multi-algorithm comparison with hyperparameter tuning
- Model Selection: Automated best model selection based on RΒ² score
π§ Advanced Algorithms
- Random Forest Regressor
- Gradient Boosting Regressor
- XGBoost Regressor
- CatBoost Regressor
- AdaBoost Regressor
- Decision Tree Regressor
- Linear Regression
π Web Application
- Modern UI/UX: Responsive design with gradient styling
- Real-time Predictions: Instant math score predictions
- Form Validation: Client-side and server-side validation
- Error Handling: Comprehensive exception handling with custom logging
π§ Production Features
- Custom Exception Handling: Detailed error tracking and logging
- Logging System: Timestamped logs for debugging and monitoring
- Modular Design: Reusable components for easy maintenance
- Configuration Management: Centralized configuration using dataclasses
π Installation
Prerequisites
- Python 3.11+
- pip package manager
Setup Instructions
Clone the repository
git clone https://github.com/yashpinjarkar10/mlproject.git cd mlprojectCreate virtual environment
python -m venv venv # Windows venv\\Scripts\\activate # Linux/Mac source venv/bin/activateInstall dependencies
pip install -r requirements.txtInstall the project in development mode
pip install -e .
π» Usage
π― Training the Model
Run the complete ML pipeline (data ingestion β transformation β model training):
python src/components/data_ingestion.py
This will:
- Load and split the dataset (80% train, 20% test)
- Apply data transformations
- Train multiple models with hyperparameter tuning
- Save the best model and preprocessor
π Running the Web Application
python app.py
Access the application at: http://localhost:5000
π Making Predictions
- Navigate to the prediction page
- Fill in the student information:
- Personal details (Gender, Ethnicity)
- Educational background (Parent education, Test prep)
- Academic scores (Reading & Writing)
- Click "Predict Math Score"
- View the predicted mathematics score
π Model Performance
The system automatically selects the best-performing model based on RΒ² score evaluation:
- Minimum Acceptable Performance: RΒ² β₯ 0.6
- Cross-validation: 3-fold CV during hyperparameter tuning
- Evaluation Metrics: RΒ² Score on test set
- Model Comparison: Comprehensive evaluation of 7 different algorithms
Hyperparameter Optimization
Each algorithm undergoes GridSearchCV with algorithm-specific parameter grids:
| Algorithm | Key Parameters Tuned |
|---|---|
| Random Forest | n_estimators, max_features |
| Gradient Boosting | learning_rate, n_estimators, subsample |
| XGBoost | learning_rate, n_estimators |
| CatBoost | depth, learning_rate, iterations |
π API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/ |
GET | Landing page with project overview |
/predictdata |
GET | Display prediction form |
/predictdata |
POST | Process prediction request and return result |
Request Format (POST /predictdata)
{
\"gender\": \"male\",
\"ethnicity\": \"group B\",
\"parental_level_of_education\": \"bachelor's degree\",
\"lunch\": \"standard\",
\"test_preparation_course\": \"completed\",
\"reading_score\": 85,
\"writing_score\": 78
}
π Project Structure
mlproject/
βββ π± app.py # Flask web application
βββ π requirements.txt # Project dependencies
βββ βοΈ setup.py # Package configuration
βββ π README.md # Project documentation
β
βββ π artifacts/ # Generated model artifacts
β βββ π data.csv # Raw dataset
β βββ π§ preprocessor.pkl # Data transformation pipeline
β βββ π€ model.pkl # Trained best model
β βββ π train.csv # Training dataset
β βββ β
test.csv # Testing dataset
β
βββ π notebook/ # Jupyter notebooks
β βββ π 1. EDA STUDENT PERFORMANCE.ipynb # Exploratory Data Analysis
β βββ π― 2. MODEL TRAINING.ipynb # Model development
β βββ π data/
β βββ π stud.csv # Original dataset
β
βββ π¨ templates/ # HTML templates
β βββ π index.html # Landing page
β βββ π home.html # Prediction form
β
βββ π¦ src/ # Source code package
β βββ π§ components/ # ML pipeline components
β β βββ π₯ data_ingestion.py # Data loading and splitting
β β βββ π data_transformation.py # Feature engineering
β β βββ π― model_trainer.py # Model training and selection
β β
β βββ π pipeline/ # Prediction pipelines
β β βββ π predict_pipeline.py # Inference pipeline
β β βββ π train_pipeline.py # Training pipeline
β β
β βββ π οΈ utils.py # Utility functions
β βββ β οΈ exception.py # Custom exception handling
β βββ π logger.py # Logging configuration
β
βββ π logs/ # Application logs
βββ π
[timestamp].log # Timestamped log files
π οΈ Technologies Used
Core Framework
- Python 3.11+: Main programming language
- Flask: Web framework for the user interface
- scikit-learn 1.2.1: Machine learning algorithms and preprocessing
Data Science Stack
- pandas: Data manipulation and analysis
- numpy: Numerical computing
- matplotlib & seaborn: Data visualization
Machine Learning Libraries
- XGBoost: Gradient boosting framework
- CatBoost: Categorical feature boosting
- dill: Advanced object serialization
Development Tools
- setuptools: Package management
- Custom logging: Application monitoring
- Exception handling: Error management
Frontend
- HTML5 & CSS3: Modern responsive web interface
- Jinja2: Template engine for dynamic content
π€ Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Development Setup
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Areas for Contribution
- π§ Additional ML algorithms
- π Enhanced data visualization
- π API improvements
- π± Mobile responsiveness
- π§ͺ Unit testing
- π Documentation improvements
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π¨βπ» Author
Yash Pinjarkar
- π§ Email: yashpinjarkar2003@gmail.com
- π GitHub: @yashpinjarkar10
- π LinkedIn: Connect with me
β If you found this project helpful, please consider giving it a star!
Built with β€οΈ for the ML community