NetworkSecurity / README.md
Inder-26
Fix data ingestion path, update README images, and enable reload
2d7183c
metadata
title: NetworkSecurity
emoji: 😻
colorFrom: blue
colorTo: blue
sdk: docker
pinned: false
license: mit

πŸ›‘οΈ Network Security System: Phishing URL Detection

UI Homepage 1 UI Homepage 2 UI Homepage 3 UI Homepage 4 UI Homepage 5

πŸ“‹ Table of Contents

πŸš€ Live Demo

🎯 About The Project

In the digital age, cybersecurity threats such as phishing attacks are becoming increasingly sophisticated. This project implements a robust Network Security Machine Learning Pipeline designed to detect phishing URLs with high accuracy.

It leverages a modular MLOps architecture, ensuring scalability, maintainability, and reproducibility. The system automates the entire flow from data ingestion to model deployment, utilizing advanced techniques like drift detection and automated model evaluation.

πŸ—οΈ Architecture

The system follows a strict modular pipeline architecture, orchestrated by a central training pipeline.

Architecture Diagram

✨ Features

  • πŸš€ End-to-End Pipeline: Fully automated workflow from data ingestion to model deployment.
  • πŸ›‘οΈ Data Validation: Comprehensive schema checks and data drift detection using KS tests.
  • πŸ”„ Robust Preprocessing: Automated handling of missing values (KNN Imputer) and feature scaling (Robust Scaler).
  • πŸ€– Multi-Model Training: Experiments with RandomForest, DecisionTree, GradientBoosting, and AdaBoost using GridSearchCV.
  • πŸ“Š Experiment Tracking: Integrated with MLflow and DagsHub for tracking parameters, metrics, and models.
  • ⚑ Fast API: High-performance REST API built with FastAPI for real-time predictions.
  • 🐳 Containerized: Docker support for consistent deployment across environments.
  • ☁️ Cloud Ready: Designed to be deployed on platforms like AWS or Hugging Face Spaces.

πŸ› οΈ Tech Stack

  • Languages: Python 3.8+
  • Frameworks: FastAPI, Uvicorn
  • ML Libraries: Scikit-learn, Pandas, NumPy
  • MLOps: MLflow, DagsHub
  • Database: MongoDB
  • Containerization: Docker
  • Frontend: HTML, CSS (Custom Design System), JavaScript

πŸ“Š Dataset

The project utilizes a dataset containing various URL features to distinguish between legitimate and phishing URLs.

πŸ“ Project Structure

NetworkSecurity/
β”œβ”€β”€ images/                  # Project diagrams and screenshots
β”œβ”€β”€ networksecurity/         # Main package
β”‚   β”œβ”€β”€ components/          # Pipeline components (Ingestion, Validation, Transformation, Training)
β”‚   β”œβ”€β”€ pipeline/            # Training and Prediction pipelines
β”‚   β”œβ”€β”€ entity/              # Artifact and Config entities
β”‚   β”œβ”€β”€ constants/           # Project constants
β”‚   β”œβ”€β”€ utils/               # Utility functions
β”‚   └── exception/           # Custom exception handling
β”œβ”€β”€ data_schema/             # Schema definitions
β”œβ”€β”€ Dockerfile               # Docker configuration
β”œβ”€β”€ app.py                   # FastAPI application entry point
β”œβ”€β”€ requirements.txt         # Project dependencies
└── README.md                # Project documentation

βš™οΈ Pipeline Workflow

1. Data Ingestion πŸ“₯

Fetches data from MongoDB, handles fallback to local CSV, and performs train-test split. Data Ingestion

2. Data Validation βœ…

Validates data against schema and checks for data drift. Data Validation

3. Data Transformation πŸ”„

Imputes missing values and scales features for optimal model performance. Data Transformation

4. Model Training πŸ€–

Trains and tunes multiple models, selecting the best one based on F1-score/Accuracy. Model Training

πŸ“Έ Screenshots

Prediction Results & Threat Assessment

Prediction Results

Experiment Tracking (DagsHub/MLflow)

Experiment Tracking

πŸ’» Installation

Prerequisites

  • Python 3.8+
  • MongoDB Account
  • DagsHub Account (for experiment tracking)

Step-by-Step

  1. Clone the Repository

    git clone https://github.com/Inder-26/NetworkSecurity.git
    cd NetworkSecurity
    
  2. Create Virtual Environment

    python -m venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    
  3. Install Dependencies

    pip install -r requirements.txt
    
  4. Set Environment Variables Create a .env file with your credentials:

    MONGO_DB_URL=your_mongodb_url_here
    MLFLOW_TRACKING_URI=https://dagshub.com/your_username/project.mlflow
    MLFLOW_TRACKING_USERNAME=your_username
    MLFLOW_TRACKING_PASSWORD=your_password
    

πŸš€ Usage

Run the Web Application

python app.py

Visit http://localhost:8000 to access the UI.

Train a New Model

To trigger the training pipeline:

http://localhost:8000/train

Or use the "Train New Model" button in the UI.

πŸ“ˆ Model Performance

The system evaluates models using accuracy and F1 score.

  • Best Model: [Automatically selected, typically RandomForest or GradientBoosting]
  • Recall: Optimized to minimize false negatives (missing a phishing URL is dangerous).

Model Evaluation Metrics

Below are the performance visualizations for the best trained model:

Confusion Matrix

Confusion Matrix

ROC Curve

ROC Curve

Precision-Recall Curve

Precision-Recall Curve

πŸ§ͺ Experiment Tracking

All runs are logged to DagsHub. You can view parameters, metrics, and models in the MLflow UI.

πŸš€ Future Enhancements

  • Implement Deep Learning models (LSTM/CNN) for URL text analysis.
  • Add real-time browser extension.
  • Deploy serverless architecture.
  • Add more comprehensive unit and integration tests.

🀝 Contributing

Contributions are welcome! Please fork the repository and create a pull request.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

πŸ“„ License

Distributed under the MIT License. See LICENSE for more information.

πŸ“ž Contact

Inder - GitHub Profile