Spaces:
Runtime error
Runtime error
File size: 7,749 Bytes
195cc50 fdafc3c 195cc50 fdafc3c 195cc50 fdafc3c 195cc50 2d7183c 195cc50 add681a 195cc50 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 | ---
title: NetworkSecurity
emoji: π»
colorFrom: blue
colorTo: blue
sdk: docker
pinned: false
license: mit
---
# π‘οΈ Network Security System: Phishing URL Detection





## π Table of Contents
- [About The Project](#-about-the-project)
- [Architecture](#-architecture)
- [Features](#-features)
- [Tech Stack](#-tech-stack)
- [Dataset](#-dataset)
- [Project Structure](#-project-structure)
- [Pipeline Workflow](#-pipeline-workflow)
- [Screenshots](#-screenshots)
- [Installation](#-installation)
- [Usage](#-usage)
- [Model Performance](#-model-performance)
- [Experiment Tracking](#-experiment-tracking)
- [Future Enhancements](#-future-enhancements)
- [Contributing](#-contributing)
- [License](#-license)
- [Contact](#-contact)
## π Live Demo
- **Live Application**: [inderjeet-networksecurity.hf.space](https://inderjeet-networksecurity.hf.space/)
- **Experiment Tracking**: [DagsHub Experiments](https://dagshub.com/Inder-26/NetworkSecurity/experiments#/)
## π― About The Project
In the digital age, cybersecurity threats such as phishing attacks are becoming increasingly sophisticated. This project implements a robust **Network Security Machine Learning Pipeline** designed to detect phishing URLs with high accuracy.
It leverages a modular MLOps architecture, ensuring scalability, maintainability, and reproducibility. The system automates the entire flow from data ingestion to model deployment, utilizing advanced techniques like drift detection and automated model evaluation.
## ποΈ Architecture
The system follows a strict modular pipeline architecture, orchestrated by a central training pipeline.

## β¨ Features
- **π End-to-End Pipeline**: Fully automated workflow from data ingestion to model deployment.
- **π‘οΈ Data Validation**: Comprehensive schema checks and data drift detection using KS tests.
- **π Robust Preprocessing**: Automated handling of missing values (KNN Imputer) and feature scaling (Robust Scaler).
- **π€ Multi-Model Training**: Experiments with RandomForest, DecisionTree, GradientBoosting, and AdaBoost using GridSearchCV.
- **π Experiment Tracking**: Integrated with **MLflow** and **DagsHub** for tracking parameters, metrics, and models.
- **β‘ Fast API**: High-performance REST API built with **FastAPI** for real-time predictions.
- **π³ Containerized**: Docker support for consistent deployment across environments.
- **βοΈ Cloud Ready**: Designed to be deployed on platforms like AWS or Hugging Face Spaces.
## π οΈ Tech Stack
- **Languages**: Python 3.8+
- **Frameworks**: FastAPI, Uvicorn
- **ML Libraries**: Scikit-learn, Pandas, NumPy
- **MLOps**: MLflow, DagsHub
- **Database**: MongoDB
- **Containerization**: Docker
- **Frontend**: HTML, CSS (Custom Design System), JavaScript
## π Dataset
The project utilizes a dataset containing various URL features to distinguish between legitimate and phishing URLs.
- **Source**: [Phishing Dataset for Machine Learning](https://archive.ics.uci.edu/ml/datasets/Phishing+Websites) (or similar Phishing URL dataset)
- **Features**: IP Address, URL Length, TinyURL, forwarding, etc.
- **Target**: `Result` (LEGITIMATE / PHISHING)
## π Project Structure
```
NetworkSecurity/
βββ images/ # Project diagrams and screenshots
βββ networksecurity/ # Main package
β βββ components/ # Pipeline components (Ingestion, Validation, Transformation, Training)
β βββ pipeline/ # Training and Prediction pipelines
β βββ entity/ # Artifact and Config entities
β βββ constants/ # Project constants
β βββ utils/ # Utility functions
β βββ exception/ # Custom exception handling
βββ data_schema/ # Schema definitions
βββ Dockerfile # Docker configuration
βββ app.py # FastAPI application entry point
βββ requirements.txt # Project dependencies
βββ README.md # Project documentation
```
## βοΈ Pipeline Workflow
### 1. Data Ingestion π₯
Fetches data from MongoDB, handles fallback to local CSV, and performs train-test split.

### 2. Data Validation β
Validates data against schema and checks for data drift.

### 3. Data Transformation π
Imputes missing values and scales features for optimal model performance.

### 4. Model Training π€
Trains and tunes multiple models, selecting the best one based on F1-score/Accuracy.

## πΈ Screenshots
### Prediction Results & Threat Assessment

### Experiment Tracking (DagsHub/MLflow)

## π» Installation
### Prerequisites
- Python 3.8+
- MongoDB Account
- DagsHub Account (for experiment tracking)
### Step-by-Step
1. **Clone the Repository**
```bash
git clone https://github.com/Inder-26/NetworkSecurity.git
cd NetworkSecurity
```
2. **Create Virtual Environment**
```bash
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
```
3. **Install Dependencies**
```bash
pip install -r requirements.txt
```
4. **Set Environment Variables**
Create a `.env` file with your credentials:
```env
MONGO_DB_URL=your_mongodb_url_here
MLFLOW_TRACKING_URI=https://dagshub.com/your_username/project.mlflow
MLFLOW_TRACKING_USERNAME=your_username
MLFLOW_TRACKING_PASSWORD=your_password
```
## π Usage
### Run the Web Application
```bash
python app.py
```
Visit `http://localhost:8000` to access the UI.
### Train a New Model
To trigger the training pipeline:
```bash
http://localhost:8000/train
```
Or use the "Train New Model" button in the UI.
## π Model Performance
The system evaluates models using accuracy and F1 score.
- **Best Model**: [Automatically selected, typically RandomForest or GradientBoosting]
- **Recall**: Optimized to minimize false negatives (missing a phishing URL is dangerous).
### Model Evaluation Metrics
Below are the performance visualizations for the best trained model:
#### Confusion Matrix

#### ROC Curve

#### Precision-Recall Curve

## π§ͺ Experiment Tracking
All runs are logged to DagsHub. You can view parameters, metrics, and models in the MLflow UI.
## π Future Enhancements
- [ ] Implement Deep Learning models (LSTM/CNN) for URL text analysis.
- [ ] Add real-time browser extension.
- [ ] Deploy serverless architecture.
- [ ] Add more comprehensive unit and integration tests.
## π€ Contributing
Contributions are welcome! Please fork the repository and create a pull request.
1. Fork the Project
2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)
3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the Branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request
## π License
Distributed under the MIT License. See `LICENSE` for more information.
## π Contact
Inder - [GitHub Profile](https://github.com/Inder-26)
|