--- title: NetworkSecurity emoji: ๐Ÿ˜ป colorFrom: blue colorTo: blue sdk: docker pinned: false license: mit --- # ๐Ÿ›ก๏ธ Network Security System: Phishing URL Detection ![UI Homepage 1](images/home_page_1.png) ![UI Homepage 2](images/home_page_2.png) ![UI Homepage 3](images/home_page_3.png) ![UI Homepage 4](images/home_page_4.png) ![UI Homepage 5](images/home_page_5.png) ## ๐Ÿ“‹ Table of Contents - [About The Project](#-about-the-project) - [Architecture](#-architecture) - [Features](#-features) - [Tech Stack](#-tech-stack) - [Dataset](#-dataset) - [Project Structure](#-project-structure) - [Pipeline Workflow](#-pipeline-workflow) - [Screenshots](#-screenshots) - [Installation](#-installation) - [Usage](#-usage) - [Model Performance](#-model-performance) - [Experiment Tracking](#-experiment-tracking) - [Future Enhancements](#-future-enhancements) - [Contributing](#-contributing) - [License](#-license) - [Contact](#-contact) ## ๐Ÿš€ Live Demo - **Live Application**: [inderjeet-networksecurity.hf.space](https://inderjeet-networksecurity.hf.space/) - **Experiment Tracking**: [DagsHub Experiments](https://dagshub.com/Inder-26/NetworkSecurity/experiments#/) ## ๐ŸŽฏ About The Project In the digital age, cybersecurity threats such as phishing attacks are becoming increasingly sophisticated. This project implements a robust **Network Security Machine Learning Pipeline** designed to detect phishing URLs with high accuracy. It leverages a modular MLOps architecture, ensuring scalability, maintainability, and reproducibility. The system automates the entire flow from data ingestion to model deployment, utilizing advanced techniques like drift detection and automated model evaluation. ## ๐Ÿ—๏ธ Architecture The system follows a strict modular pipeline architecture, orchestrated by a central training pipeline. ![Architecture Diagram](images/architecture_diagram.png) ## โœจ Features - **๐Ÿš€ End-to-End Pipeline**: Fully automated workflow from data ingestion to model deployment. - **๐Ÿ›ก๏ธ Data Validation**: Comprehensive schema checks and data drift detection using KS tests. - **๐Ÿ”„ Robust Preprocessing**: Automated handling of missing values (KNN Imputer) and feature scaling (Robust Scaler). - **๐Ÿค– Multi-Model Training**: Experiments with RandomForest, DecisionTree, GradientBoosting, and AdaBoost using GridSearchCV. - **๐Ÿ“Š Experiment Tracking**: Integrated with **MLflow** and **DagsHub** for tracking parameters, metrics, and models. - **โšก Fast API**: High-performance REST API built with **FastAPI** for real-time predictions. - **๐Ÿณ Containerized**: Docker support for consistent deployment across environments. - **โ˜๏ธ Cloud Ready**: Designed to be deployed on platforms like AWS or Hugging Face Spaces. ## ๐Ÿ› ๏ธ Tech Stack - **Languages**: Python 3.8+ - **Frameworks**: FastAPI, Uvicorn - **ML Libraries**: Scikit-learn, Pandas, NumPy - **MLOps**: MLflow, DagsHub - **Database**: MongoDB - **Containerization**: Docker - **Frontend**: HTML, CSS (Custom Design System), JavaScript ## ๐Ÿ“Š Dataset The project utilizes a dataset containing various URL features to distinguish between legitimate and phishing URLs. - **Source**: [Phishing Dataset for Machine Learning](https://archive.ics.uci.edu/ml/datasets/Phishing+Websites) (or similar Phishing URL dataset) - **Features**: IP Address, URL Length, TinyURL, forwarding, etc. - **Target**: `Result` (LEGITIMATE / PHISHING) ## ๐Ÿ“ Project Structure ``` NetworkSecurity/ โ”œโ”€โ”€ images/ # Project diagrams and screenshots โ”œโ”€โ”€ networksecurity/ # Main package โ”‚ โ”œโ”€โ”€ components/ # Pipeline components (Ingestion, Validation, Transformation, Training) โ”‚ โ”œโ”€โ”€ pipeline/ # Training and Prediction pipelines โ”‚ โ”œโ”€โ”€ entity/ # Artifact and Config entities โ”‚ โ”œโ”€โ”€ constants/ # Project constants โ”‚ โ”œโ”€โ”€ utils/ # Utility functions โ”‚ โ””โ”€โ”€ exception/ # Custom exception handling โ”œโ”€โ”€ data_schema/ # Schema definitions โ”œโ”€โ”€ Dockerfile # Docker configuration โ”œโ”€โ”€ app.py # FastAPI application entry point โ”œโ”€โ”€ requirements.txt # Project dependencies โ””โ”€โ”€ README.md # Project documentation ``` ## โš™๏ธ Pipeline Workflow ### 1. Data Ingestion ๐Ÿ“ฅ Fetches data from MongoDB, handles fallback to local CSV, and performs train-test split. ![Data Ingestion](images/data_ingestion_diagram.png) ### 2. Data Validation โœ… Validates data against schema and checks for data drift. ![Data Validation](images/data_validation_diagram.png) ### 3. Data Transformation ๐Ÿ”„ Imputes missing values and scales features for optimal model performance. ![Data Transformation](images/data_transformation_diagram.png) ### 4. Model Training ๐Ÿค– Trains and tunes multiple models, selecting the best one based on F1-score/Accuracy. ![Model Training](images/model_training_diagram.png) ## ๐Ÿ“ธ Screenshots ### Prediction Results & Threat Assessment ![Prediction Results](images/prediction_results.png) ### Experiment Tracking (DagsHub/MLflow) ![Experiment Tracking](images/dagshub_experiments.png) ## ๐Ÿ’ป Installation ### Prerequisites - Python 3.8+ - MongoDB Account - DagsHub Account (for experiment tracking) ### Step-by-Step 1. **Clone the Repository** ```bash git clone https://github.com/Inder-26/NetworkSecurity.git cd NetworkSecurity ``` 2. **Create Virtual Environment** ```bash python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate ``` 3. **Install Dependencies** ```bash pip install -r requirements.txt ``` 4. **Set Environment Variables** Create a `.env` file with your credentials: ```env MONGO_DB_URL=your_mongodb_url_here MLFLOW_TRACKING_URI=https://dagshub.com/your_username/project.mlflow MLFLOW_TRACKING_USERNAME=your_username MLFLOW_TRACKING_PASSWORD=your_password ``` ## ๐Ÿš€ Usage ### Run the Web Application ```bash python app.py ``` Visit `http://localhost:8000` to access the UI. ### Train a New Model To trigger the training pipeline: ```bash http://localhost:8000/train ``` Or use the "Train New Model" button in the UI. ## ๐Ÿ“ˆ Model Performance The system evaluates models using accuracy and F1 score. - **Best Model**: [Automatically selected, typically RandomForest or GradientBoosting] - **Recall**: Optimized to minimize false negatives (missing a phishing URL is dangerous). ### Model Evaluation Metrics Below are the performance visualizations for the best trained model: #### Confusion Matrix ![Confusion Matrix](images/confusion_matrix.png) #### ROC Curve ![ROC Curve](images/roc_curve.png) #### Precision-Recall Curve ![Precision-Recall Curve](images/precision_recall_curve.png) ## ๐Ÿงช Experiment Tracking All runs are logged to DagsHub. You can view parameters, metrics, and models in the MLflow UI. ## ๐Ÿš€ Future Enhancements - [ ] Implement Deep Learning models (LSTM/CNN) for URL text analysis. - [ ] Add real-time browser extension. - [ ] Deploy serverless architecture. - [ ] Add more comprehensive unit and integration tests. ## ๐Ÿค Contributing Contributions are welcome! Please fork the repository and create a pull request. 1. Fork the Project 2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`) 3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`) 4. Push to the Branch (`git push origin feature/AmazingFeature`) 5. Open a Pull Request ## ๐Ÿ“„ License Distributed under the MIT License. See `LICENSE` for more information. ## ๐Ÿ“ž Contact Inder - [GitHub Profile](https://github.com/Inder-26)