# πŸ›‘οΈ SentinelNet β€” AI-Powered Network Intrusion Detection System
**Production ML system detecting 5 categories of network threats in real-time** [![Live Demo](https://img.shields.io/badge/Live%20Demo-HuggingFace%20Spaces-blue?style=for-the-badge&logo=huggingface)](https://huggingface.co/spaces/Hitan2004/sentinelnet) [![GitHub](https://img.shields.io/badge/GitHub-Repository-black?style=for-the-badge&logo=github)](https://github.com/Hitan547/sentinelnet) [![Python](https://img.shields.io/badge/Python-3.10-blue?style=for-the-badge&logo=python)](#tech-stack) [![scikit-learn](https://img.shields.io/badge/ML-scikit--learn-orange?style=for-the-badge)](#tech-stack) *A full-stack real-time intrusion detection dashboard with hybrid frontend, REST API, and automated CI/CD deployment.*
--- ## 🎯 Overview SentinelNet is a production-grade network intrusion detection system that analyzes live traffic and batch CSV datasets to classify connections into 5 threat categories. Built with a Random Forest classifier trained on the NSL-KDD dataset, it combines real-time inference with a sophisticated web dashboard and self-correcting batch processing. ### ⚑ Key Capabilities | Feature | Capability | |---------|-----------| | **Real-Time Detection** | 1000s of live packets/sec through trained ML model | | **Threat Classification** | 5-class detection: normal, DoS, Probe, R2L, U2R | | **Batch Analysis** | Process CSVs with live progress, streaming predictions, auto-generated threat reports | | **Visual Intelligence** | Live timeline, activity heatmaps, confidence distributions, attack patterns | | **Export Formats** | CSV, PDF reports, JSON for integration | | **Deployment** | Docker containerized, live on HuggingFace Spaces | --- ## πŸ—οΈ Architecture ### System Diagram ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ SentinelNet System β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Flask Backend β”‚ β”‚ (app.py) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”‚ /health β”‚ β”‚/predict β”‚ β”‚ /static β”‚ β”‚ Endpoint β”‚ β”‚ Batch β”‚ β”‚ Frontend β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ Inferenceβ”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ML Pipelineβ”‚ β”‚One-Hot β”‚ β”‚Label β”‚ β”‚Processing β”‚ β”‚Encoder β”‚ β”‚Encoder β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Random Forest Classifier β”‚ β”‚ (sentinel_brain.joblib) β”‚ β”‚ 41 NSL-KDD Features β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ### Data Flow ``` User Input (Live or CSV) ↓ Feature Extraction & Validation ↓ One-Hot Encoding (protocol_type, flag) ↓ Frequency Encoding (service) ↓ Log Transforms (src_bytes, dst_bytes, duration) ↓ Feature Engineering (total_bytes, ratios, error flags) ↓ Standard Scaling (all features) ↓ Random Forest Inference ↓ Prediction + Confidence Score ↓ Severity Mapping ↓ JSON Response / Dashboard Update ``` --- ## πŸ“Š Model Performance ### Training Details - **Algorithm**: Random Forest Classifier (100 trees) - **Dataset**: NSL-KDD (improved KDD Cup 1999) - **Features**: 41 network connection attributes - **Classes**: 5 (normal, DoS, Probe, R2L, U2R) - **Preprocessing**: OHE, frequency encoding, log transforms, standard scaling ### Threat Categories | Class | Type | Severity | Examples | |-------|------|----------|----------| | `normal` | Clean traffic | βœ… None | HTTP requests, DNS queries | | `DoS` | Denial of Service | πŸ”΄ **Critical** | SYN floods, UDP storms | | `Probe` | Reconnaissance | 🟠 Medium | Port scanning, OS fingerprinting | | `R2L` | Remote to Local | πŸ”΄ High | SSH brute force, FTP attacks | | `U2R` | User to Root | πŸ”΄ **Critical** | Buffer overflow, privilege escalation | --- ## ✨ Features ### πŸ“‘ Live Monitor Tab Real-time threat detection with auto-generated NSL-KDD formatted packets - **Auto-Generation**: Simulates realistic network traffic packets - **Real-Time Inference**: Each packet sent to trained model instantly - **Live Detection Feed**: Class, confidence, severity per packet - **Attack Distribution Chart**: Bar chart updating in real-time - **Threat Timeline**: Last 60 seconds of activity - **Activity Heatmap**: 60Γ—8 grid of recent packets - **Confidence Distribution**: Histogram of model certainty - **System Log**: Terminal-style event log - **Session Summary**: Total packets, attacks detected, accuracy metrics ### πŸ“‚ CSV Analysis Tab Upload and analyze NSL-KDD formatted datasets with streaming predictions - **Smart Header Detection**: Auto-detects with or without column names - **Batch Processing**: Optimized row-by-row inference through model - **Live Progress**: Real-time bar with ETA and processing speed (rows/sec) - **Streaming Results**: Predictions appear as they're computed - **Threat Report Generation** (on completion): - Risk score gauge (0–100) - Class distribution bar chart - Confidence waveform over entire dataset - Threat intensity rolling average - Protocol breakdown pie chart - Top targeted services - Attack pattern clustering visualization - Paginated full results table with sorting/filtering - **Multi-Format Export**: CSV, PDF report, JSON --- ## 🧠 ML Pipeline Deep Dive ### Feature Engineering ```python # Input: 41 raw NSL-KDD features features_raw = { 'duration', 'protocol_type', 'service', 'flag', 'src_bytes', 'dst_bytes', 'land', 'wrong_fragment', 'urgent', 'hot', 'num_failed_logins', 'logged_in', 'num_compromised', 'root_shell', 'su_attempted', 'num_root', 'num_file_creations', 'num_shells', 'num_access_files', 'num_outbound_cmds', 'is_host_login', 'is_guest_login', 'count', 'srv_count', 'serror_rate', 'srv_serror_rate', 'rerror_rate', 'srv_rerror_rate', 'same_srv_rate', 'diff_srv_rate', 'srv_diff_host_rate', 'dst_host_count', 'dst_host_srv_count', 'dst_host_same_srv_rate', 'dst_host_diff_srv_rate', 'dst_host_same_src_port_rate', 'dst_host_srv_diff_host_rate' } # Preprocessing Pipeline 1. One-hot encoding: protocol_type (3 categories) β†’ 3 columns 2. One-hot encoding: flag (11 categories) β†’ 11 columns 3. Frequency encoding: service β†’ maps to frequency rank 4. Log transforms: log(1 + src_bytes), log(1 + dst_bytes), log(1 + duration) 5. Feature engineering: - total_bytes = src_bytes + dst_bytes - src_bytes_ratio = src_bytes / (total_bytes + 1) - is_error_flag = 1 if error flag present 6. Standard scaling: (x - mean) / std for all numeric features # Output: 41 standardized features β†’ Random Forest inference ``` ### Serialization All pipeline artifacts are serialized with `joblib` for production reliability: ``` models/ β”œβ”€β”€ sentinel_brain.joblib # Trained Random Forest (100 trees) β”œβ”€β”€ label_encoder.joblib # Encodes target class labels β”œβ”€β”€ ohe_encoder.joblib # One-hot encoder for protocol/flag β”œβ”€β”€ freq_map.joblib # Service frequency mapping dictionary β”œβ”€β”€ scaler.joblib # StandardScaler fitted on training data └── selected_features.joblib # List of 41 selected features in order ``` --- ## πŸš€ Quick Start ### Prerequisites - Python 3.10+ - pip or conda - 500MB disk space for models ### Local Setup (5 minutes) ```bash # 1. Clone repository git clone https://github.com/Hitan547/sentinelnet.git cd sentinelnet # 2. Create virtual environment (recommended) python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # 3. Install dependencies pip install -r requirements.txt # 4. Run Flask server python app.py # 5. Open browser # β†’ http://localhost:7860 ``` ### Docker Setup (for Spaces or cloud deployment) ```bash # Build image docker build -t sentinelnet:latest . # Run container docker run -p 7860:7860 sentinelnet:latest # Access at http://localhost:7860 ``` ### Deployment on HuggingFace Spaces 1. Create new Space on HuggingFace 2. Select "Docker" runtime 3. Clone this repo 4. Push to Space repo 5. Auto-deploys and serves live --- ## πŸ”Œ REST API Reference ### POST `/predict` Batch inference endpoint for NSL-KDD formatted network packets **Request:** ```json { "rows": [ { "duration": 0, "protocol_type": "tcp", "service": "http", "flag": "SF", "src_bytes": 181, "dst_bytes": 5450, "land": 0, "wrong_fragment": 0, "urgent": 0, "hot": 0, "num_failed_logins": 0, "logged_in": 1, "num_compromised": 0, "root_shell": 0, "su_attempted": 0, "num_root": 0, "num_file_creations": 0, "num_shells": 0, "num_access_files": 0, "num_outbound_cmds": 0, "is_host_login": 0, "is_guest_login": 0, "count": 1, "srv_count": 1, "serror_rate": 0.0, "srv_serror_rate": 0.0, "rerror_rate": 0.0, "srv_rerror_rate": 0.0, "same_srv_rate": 1.0, "diff_srv_rate": 0.0, "srv_diff_host_rate": 0.0, "dst_host_count": 1, "dst_host_srv_count": 1, "dst_host_same_srv_rate": 1.0, "dst_host_diff_srv_rate": 0.0, "dst_host_same_src_port_rate": 0.0, "dst_host_srv_diff_host_rate": 0.0 } ] } ``` **Response:** ```json { "status": "ok", "results": [ { "predicted_class": "normal", "severity": "None", "confidence": 0.9821, "is_intrusion": false } ] } ``` ### GET `/health` System health check **Response:** ```json { "status": "online", "model": "sentinel_brain", "version": "1.0.0", "uptime_seconds": 3600 } ``` --- ## πŸ“ Project Structure ``` sentinelnet/ β”œβ”€β”€ frontend/ β”‚ β”œβ”€β”€ index.html # Main HTML with tabs, charts, tables β”‚ β”œβ”€β”€ style.css # CSS variables, grid layout, animations β”‚ └── app.js # Canvas charts, API calls, event handlers β”œβ”€β”€ models/ β”‚ β”œβ”€β”€ sentinel_brain.joblib # Random Forest classifier β”‚ β”œβ”€β”€ label_encoder.joblib # Target label encoding β”‚ β”œβ”€β”€ ohe_encoder.joblib # Protocol/flag one-hot encoder β”‚ β”œβ”€β”€ freq_map.joblib # Service frequency dictionary β”‚ β”œβ”€β”€ scaler.joblib # Standard scaler β”‚ └── selected_features.joblib # 41 feature names + order β”œβ”€β”€ app.py # Flask server + /predict + /health endpoints β”œβ”€β”€ requirements.txt # Python dependencies (Flask, scikit-learn, etc.) β”œβ”€β”€ Dockerfile # Multi-stage build for HuggingFace Spaces β”œβ”€β”€ .dockerignore # Excludes unnecessary files from build β”œβ”€β”€ .github/ β”‚ └── workflows/ β”‚ └── ci.yml # GitHub Actions CI pipeline └── README.md # This file ``` --- ## πŸ”„ CI/CD Pipeline ### Continuous Integration (GitHub Actions) ```yaml on: [push, pull_request] jobs: build: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - uses: actions/setup-python@v4 with: python-version: '3.10' - name: Install dependencies run: pip install -r requirements.txt - name: Syntax check run: python -m py_compile app.py - name: Health check (skip models) env: SKIP_MODEL: true run: python app.py & sleep 2 curl http://localhost:7860/health - name: Docker build test run: docker build -t sentinelnet:test . ``` **CI Features:** - βœ… Python 3.10 environment setup - βœ… Dependency installation verification - βœ… Code syntax validation - βœ… Flask app health check (with `SKIP_MODEL=true` to avoid model loading timeout) - βœ… Docker image build validation ### Continuous Deployment (HuggingFace Spaces) - **Trigger**: Push to `main` branch - **Action**: Auto-deploys Docker container to HuggingFace Spaces - **Endpoint**: https://huggingface.co/spaces/Hitan2004/sentinelnet - **Uptime**: Always available (free tier with occasional cold starts) --- ## πŸŽ“ What I Learned βœ… **Production ML Systems** - Training and deploying multi-class classification models end-to-end - Feature engineering and preprocessing pipeline serialization - Model serving via REST API with batch inference βœ… **Real-Time Dashboards** - Building interactive dashboards with vanilla JavaScript - Canvas API for high-performance charting (thousands of data points) - Responsive design for desktop and tablet βœ… **Backend Engineering** - Flask REST API design and CORS handling - Batch processing with streaming progress feedback - Error handling and validation βœ… **DevOps & Deployment** - Docker containerization for reproducible environments - HuggingFace Spaces deployment workflow - GitHub Actions CI/CD pipeline with smart skipping βœ… **Advanced Concepts** - NSL-KDD dataset characteristics and threat modeling - One-hot vs. frequency encoding trade-offs - Log transforms for skewed feature distributions - Cross-entropy loss and feature importance in Random Forest --- ## πŸ“Š Dataset Reference **NSL-KDD Dataset** - Improved version of KDD Cup 1999 - **Size**: 125,973 training records, 22,544 test records - **Features**: 41 network connection attributes - **Classes**: 5 (normal, DoS, Probe, R2L, U2R) - **Advantages**: Removes duplicate records, more balanced class distribution - **Standard**: Widely used benchmark for IDS research **Attribute Categories:** - Basic features (10): duration, protocol, service, flag, bytes - Content features (13): hot, num_failed_logins, logged_in, compromised, etc. - Time-based traffic features (9): count, srv_count, serror_rate, etc. - Host-based traffic features (9): dst_host_count, dst_host_srv_count, etc. --- ## 🀝 Contributing This is a portfolio project, but you're welcome to fork and extend! **Ideas for enhancement:** - [ ] Add LSTM-based temporal anomaly detection - [ ] Implement feature importance visualization - [ ] Add real PCAP file ingestion - [ ] Multi-model ensemble (XGBoost + Neural Network) - [ ] Real-time alerting webhook integration --- ## πŸ“œ License MIT License β€” Use freely for learning, portfolio, or production purposes. --- ## πŸ“ž Contact **Hitan K** β€” AI Systems Engineer - πŸ”— [LinkedIn](https://linkedin.com/in/hitan-k) - πŸ™ [GitHub](https://github.com/Hitan547) - πŸ€— [HuggingFace](https://huggingface.co/Hitan2004) - πŸ“§ [Email](mailto:hitan.k@outlook.com) ---
**⭐ If this helped you, please star the repo! ⭐** *Built with ❀️ for production and learning.*