Spaces:

inderjeet
/

NetworkSecurity

Runtime error

App Files Files Community

NetworkSecurity / README.md

Inder-26

Fix data ingestion path, update README images, and enable reload

2d7183c 3 months ago

preview code

raw

history blame contribute delete

7.75 kB

metadata

title: NetworkSecurity
emoji: 😻
colorFrom: blue
colorTo: blue
sdk: docker
pinned: false
license: mit

🛡️ Network Security System: Phishing URL Detection

📋 Table of Contents

About The Project
Architecture
Features
Tech Stack
Dataset
Project Structure
Pipeline Workflow
Screenshots
Installation
Usage
Model Performance
Experiment Tracking
Future Enhancements
Contributing
License
Contact

🚀 Live Demo

Live Application: inderjeet-networksecurity.hf.space
Experiment Tracking: DagsHub Experiments

🎯 About The Project

In the digital age, cybersecurity threats such as phishing attacks are becoming increasingly sophisticated. This project implements a robust Network Security Machine Learning Pipeline designed to detect phishing URLs with high accuracy.

It leverages a modular MLOps architecture, ensuring scalability, maintainability, and reproducibility. The system automates the entire flow from data ingestion to model deployment, utilizing advanced techniques like drift detection and automated model evaluation.

🏗️ Architecture

The system follows a strict modular pipeline architecture, orchestrated by a central training pipeline.

✨ Features

🚀 End-to-End Pipeline: Fully automated workflow from data ingestion to model deployment.
🛡️ Data Validation: Comprehensive schema checks and data drift detection using KS tests.
🔄 Robust Preprocessing: Automated handling of missing values (KNN Imputer) and feature scaling (Robust Scaler).
🤖 Multi-Model Training: Experiments with RandomForest, DecisionTree, GradientBoosting, and AdaBoost using GridSearchCV.
📊 Experiment Tracking: Integrated with MLflow and DagsHub for tracking parameters, metrics, and models.
⚡ Fast API: High-performance REST API built with FastAPI for real-time predictions.
🐳 Containerized: Docker support for consistent deployment across environments.
☁️ Cloud Ready: Designed to be deployed on platforms like AWS or Hugging Face Spaces.

🛠️ Tech Stack

Languages: Python 3.8+
Frameworks: FastAPI, Uvicorn
ML Libraries: Scikit-learn, Pandas, NumPy
MLOps: MLflow, DagsHub
Database: MongoDB
Containerization: Docker
Frontend: HTML, CSS (Custom Design System), JavaScript

📊 Dataset

The project utilizes a dataset containing various URL features to distinguish between legitimate and phishing URLs.

Source: Phishing Dataset for Machine Learning (or similar Phishing URL dataset)
Features: IP Address, URL Length, TinyURL, forwarding, etc.
Target: Result (LEGITIMATE / PHISHING)

📁 Project Structure

NetworkSecurity/
├── images/                  # Project diagrams and screenshots
├── networksecurity/         # Main package
│   ├── components/          # Pipeline components (Ingestion, Validation, Transformation, Training)
│   ├── pipeline/            # Training and Prediction pipelines
│   ├── entity/              # Artifact and Config entities
│   ├── constants/           # Project constants
│   ├── utils/               # Utility functions
│   └── exception/           # Custom exception handling
├── data_schema/             # Schema definitions
├── Dockerfile               # Docker configuration
├── app.py                   # FastAPI application entry point
├── requirements.txt         # Project dependencies
└── README.md                # Project documentation

⚙️ Pipeline Workflow

1. Data Ingestion 📥

Fetches data from MongoDB, handles fallback to local CSV, and performs train-test split.

2. Data Validation ✅

Validates data against schema and checks for data drift.

3. Data Transformation 🔄

Imputes missing values and scales features for optimal model performance.

4. Model Training 🤖

Trains and tunes multiple models, selecting the best one based on F1-score/Accuracy.

📸 Screenshots

Prediction Results & Threat Assessment

Experiment Tracking (DagsHub/MLflow)

💻 Installation

Prerequisites

Python 3.8+
MongoDB Account
DagsHub Account (for experiment tracking)

Step-by-Step

Clone the Repository

git clone https://github.com/Inder-26/NetworkSecurity.git
cd NetworkSecurity

Create Virtual Environment

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install Dependencies
```
pip install -r requirements.txt
```

Set Environment Variables Create a .env file with your credentials:

MONGO_DB_URL=your_mongodb_url_here
MLFLOW_TRACKING_URI=https://dagshub.com/your_username/project.mlflow
MLFLOW_TRACKING_USERNAME=your_username
MLFLOW_TRACKING_PASSWORD=your_password

🚀 Usage

Run the Web Application

python app.py

Visit http://localhost:8000 to access the UI.

Train a New Model

To trigger the training pipeline:

http://localhost:8000/train

Or use the "Train New Model" button in the UI.

📈 Model Performance

The system evaluates models using accuracy and F1 score.

Best Model: [Automatically selected, typically RandomForest or GradientBoosting]
Recall: Optimized to minimize false negatives (missing a phishing URL is dangerous).