Spaces:
Runtime error
Runtime error
| title: NetworkSecurity | |
| emoji: π» | |
| colorFrom: blue | |
| colorTo: blue | |
| sdk: docker | |
| pinned: false | |
| license: mit | |
| # π‘οΈ Network Security System: Phishing URL Detection | |
|  | |
|  | |
|  | |
|  | |
|  | |
| ## π Table of Contents | |
| - [About The Project](#-about-the-project) | |
| - [Architecture](#-architecture) | |
| - [Features](#-features) | |
| - [Tech Stack](#-tech-stack) | |
| - [Dataset](#-dataset) | |
| - [Project Structure](#-project-structure) | |
| - [Pipeline Workflow](#-pipeline-workflow) | |
| - [Screenshots](#-screenshots) | |
| - [Installation](#-installation) | |
| - [Usage](#-usage) | |
| - [Model Performance](#-model-performance) | |
| - [Experiment Tracking](#-experiment-tracking) | |
| - [Future Enhancements](#-future-enhancements) | |
| - [Contributing](#-contributing) | |
| - [License](#-license) | |
| - [Contact](#-contact) | |
| ## π Live Demo | |
| - **Live Application**: [inderjeet-networksecurity.hf.space](https://inderjeet-networksecurity.hf.space/) | |
| - **Experiment Tracking**: [DagsHub Experiments](https://dagshub.com/Inder-26/NetworkSecurity/experiments#/) | |
| ## π― About The Project | |
| In the digital age, cybersecurity threats such as phishing attacks are becoming increasingly sophisticated. This project implements a robust **Network Security Machine Learning Pipeline** designed to detect phishing URLs with high accuracy. | |
| It leverages a modular MLOps architecture, ensuring scalability, maintainability, and reproducibility. The system automates the entire flow from data ingestion to model deployment, utilizing advanced techniques like drift detection and automated model evaluation. | |
| ## ποΈ Architecture | |
| The system follows a strict modular pipeline architecture, orchestrated by a central training pipeline. | |
|  | |
| ## β¨ Features | |
| - **π End-to-End Pipeline**: Fully automated workflow from data ingestion to model deployment. | |
| - **π‘οΈ Data Validation**: Comprehensive schema checks and data drift detection using KS tests. | |
| - **π Robust Preprocessing**: Automated handling of missing values (KNN Imputer) and feature scaling (Robust Scaler). | |
| - **π€ Multi-Model Training**: Experiments with RandomForest, DecisionTree, GradientBoosting, and AdaBoost using GridSearchCV. | |
| - **π Experiment Tracking**: Integrated with **MLflow** and **DagsHub** for tracking parameters, metrics, and models. | |
| - **β‘ Fast API**: High-performance REST API built with **FastAPI** for real-time predictions. | |
| - **π³ Containerized**: Docker support for consistent deployment across environments. | |
| - **βοΈ Cloud Ready**: Designed to be deployed on platforms like AWS or Hugging Face Spaces. | |
| ## π οΈ Tech Stack | |
| - **Languages**: Python 3.8+ | |
| - **Frameworks**: FastAPI, Uvicorn | |
| - **ML Libraries**: Scikit-learn, Pandas, NumPy | |
| - **MLOps**: MLflow, DagsHub | |
| - **Database**: MongoDB | |
| - **Containerization**: Docker | |
| - **Frontend**: HTML, CSS (Custom Design System), JavaScript | |
| ## π Dataset | |
| The project utilizes a dataset containing various URL features to distinguish between legitimate and phishing URLs. | |
| - **Source**: [Phishing Dataset for Machine Learning](https://archive.ics.uci.edu/ml/datasets/Phishing+Websites) (or similar Phishing URL dataset) | |
| - **Features**: IP Address, URL Length, TinyURL, forwarding, etc. | |
| - **Target**: `Result` (LEGITIMATE / PHISHING) | |
| ## π Project Structure | |
| ``` | |
| NetworkSecurity/ | |
| βββ images/ # Project diagrams and screenshots | |
| βββ networksecurity/ # Main package | |
| β βββ components/ # Pipeline components (Ingestion, Validation, Transformation, Training) | |
| β βββ pipeline/ # Training and Prediction pipelines | |
| β βββ entity/ # Artifact and Config entities | |
| β βββ constants/ # Project constants | |
| β βββ utils/ # Utility functions | |
| β βββ exception/ # Custom exception handling | |
| βββ data_schema/ # Schema definitions | |
| βββ Dockerfile # Docker configuration | |
| βββ app.py # FastAPI application entry point | |
| βββ requirements.txt # Project dependencies | |
| βββ README.md # Project documentation | |
| ``` | |
| ## βοΈ Pipeline Workflow | |
| ### 1. Data Ingestion π₯ | |
| Fetches data from MongoDB, handles fallback to local CSV, and performs train-test split. | |
|  | |
| ### 2. Data Validation β | |
| Validates data against schema and checks for data drift. | |
|  | |
| ### 3. Data Transformation π | |
| Imputes missing values and scales features for optimal model performance. | |
|  | |
| ### 4. Model Training π€ | |
| Trains and tunes multiple models, selecting the best one based on F1-score/Accuracy. | |
|  | |
| ## πΈ Screenshots | |
| ### Prediction Results & Threat Assessment | |
|  | |
| ### Experiment Tracking (DagsHub/MLflow) | |
|  | |
| ## π» Installation | |
| ### Prerequisites | |
| - Python 3.8+ | |
| - MongoDB Account | |
| - DagsHub Account (for experiment tracking) | |
| ### Step-by-Step | |
| 1. **Clone the Repository** | |
| ```bash | |
| git clone https://github.com/Inder-26/NetworkSecurity.git | |
| cd NetworkSecurity | |
| ``` | |
| 2. **Create Virtual Environment** | |
| ```bash | |
| python -m venv .venv | |
| source .venv/bin/activate # On Windows: .venv\Scripts\activate | |
| ``` | |
| 3. **Install Dependencies** | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| 4. **Set Environment Variables** | |
| Create a `.env` file with your credentials: | |
| ```env | |
| MONGO_DB_URL=your_mongodb_url_here | |
| MLFLOW_TRACKING_URI=https://dagshub.com/your_username/project.mlflow | |
| MLFLOW_TRACKING_USERNAME=your_username | |
| MLFLOW_TRACKING_PASSWORD=your_password | |
| ``` | |
| ## π Usage | |
| ### Run the Web Application | |
| ```bash | |
| python app.py | |
| ``` | |
| Visit `http://localhost:8000` to access the UI. | |
| ### Train a New Model | |
| To trigger the training pipeline: | |
| ```bash | |
| http://localhost:8000/train | |
| ``` | |
| Or use the "Train New Model" button in the UI. | |
| ## π Model Performance | |
| The system evaluates models using accuracy and F1 score. | |
| - **Best Model**: [Automatically selected, typically RandomForest or GradientBoosting] | |
| - **Recall**: Optimized to minimize false negatives (missing a phishing URL is dangerous). | |
| ### Model Evaluation Metrics | |
| Below are the performance visualizations for the best trained model: | |
| #### Confusion Matrix | |
|  | |
| #### ROC Curve | |
|  | |
| #### Precision-Recall Curve | |
|  | |
| ## π§ͺ Experiment Tracking | |
| All runs are logged to DagsHub. You can view parameters, metrics, and models in the MLflow UI. | |
| ## π Future Enhancements | |
| - [ ] Implement Deep Learning models (LSTM/CNN) for URL text analysis. | |
| - [ ] Add real-time browser extension. | |
| - [ ] Deploy serverless architecture. | |
| - [ ] Add more comprehensive unit and integration tests. | |
| ## π€ Contributing | |
| Contributions are welcome! Please fork the repository and create a pull request. | |
| 1. Fork the Project | |
| 2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`) | |
| 3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`) | |
| 4. Push to the Branch (`git push origin feature/AmazingFeature`) | |
| 5. Open a Pull Request | |
| ## π License | |
| Distributed under the MIT License. See `LICENSE` for more information. | |
| ## π Contact | |
| Inder - [GitHub Profile](https://github.com/Inder-26) | |