Spaces:
Configuration error
Configuration error
| # π‘οΈ SentinelNet β AI-Powered Network Intrusion Detection System | |
| <div align="center"> | |
| **Production ML system detecting 5 categories of network threats in real-time** | |
| [](https://huggingface.co/spaces/Hitan2004/sentinelnet) | |
| [](https://github.com/Hitan547/sentinelnet) | |
| [](#tech-stack) | |
| [](#tech-stack) | |
| *A full-stack real-time intrusion detection dashboard with hybrid frontend, REST API, and automated CI/CD deployment.* | |
| </div> | |
| --- | |
| ## π― Overview | |
| SentinelNet is a production-grade network intrusion detection system that analyzes live traffic and batch CSV datasets to classify connections into 5 threat categories. Built with a Random Forest classifier trained on the NSL-KDD dataset, it combines real-time inference with a sophisticated web dashboard and self-correcting batch processing. | |
| ### β‘ Key Capabilities | |
| | Feature | Capability | | |
| |---------|-----------| | |
| | **Real-Time Detection** | 1000s of live packets/sec through trained ML model | | |
| | **Threat Classification** | 5-class detection: normal, DoS, Probe, R2L, U2R | | |
| | **Batch Analysis** | Process CSVs with live progress, streaming predictions, auto-generated threat reports | | |
| | **Visual Intelligence** | Live timeline, activity heatmaps, confidence distributions, attack patterns | | |
| | **Export Formats** | CSV, PDF reports, JSON for integration | | |
| | **Deployment** | Docker containerized, live on HuggingFace Spaces | | |
| --- | |
| ## ποΈ Architecture | |
| ### System Diagram | |
| ``` | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β SentinelNet System β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| ββββββββββββββββββββ | |
| β Flask Backend β | |
| β (app.py) β | |
| ββββββββββ¬ββββββββββ | |
| β | |
| βββββββββββββββββββββΌββββββββββββββββββββ | |
| β β β | |
| ββββββΌβββββ ββββββΌβββββ βββββββΌβββββββ | |
| β /health β β/predict β β /static β | |
| β Endpoint β β Batch β β Frontend β | |
| ββββββββββββ β Inferenceβ ββββββββββββββ | |
| ββββββ¬βββββ | |
| β | |
| βββββββββββββββββΌββββββββββββββββ | |
| β β β | |
| ββββββΌβββββββ ββββββΌββββββ βββββΌβββββββββββ | |
| βML Pipelineβ βOne-Hot β βLabel β | |
| βProcessing β βEncoder β βEncoder β | |
| βββββββββββββ βββββββββββββ ββββββββββββββββ | |
| β | |
| ββββββΌβββββββββββββββββββββββ | |
| β Random Forest Classifier β | |
| β (sentinel_brain.joblib) β | |
| β 41 NSL-KDD Features β | |
| βββββββββββββββββββββββββββββ | |
| ``` | |
| ### Data Flow | |
| ``` | |
| User Input (Live or CSV) | |
| β | |
| Feature Extraction & Validation | |
| β | |
| One-Hot Encoding (protocol_type, flag) | |
| β | |
| Frequency Encoding (service) | |
| β | |
| Log Transforms (src_bytes, dst_bytes, duration) | |
| β | |
| Feature Engineering (total_bytes, ratios, error flags) | |
| β | |
| Standard Scaling (all features) | |
| β | |
| Random Forest Inference | |
| β | |
| Prediction + Confidence Score | |
| β | |
| Severity Mapping | |
| β | |
| JSON Response / Dashboard Update | |
| ``` | |
| --- | |
| ## π Model Performance | |
| ### Training Details | |
| - **Algorithm**: Random Forest Classifier (100 trees) | |
| - **Dataset**: NSL-KDD (improved KDD Cup 1999) | |
| - **Features**: 41 network connection attributes | |
| - **Classes**: 5 (normal, DoS, Probe, R2L, U2R) | |
| - **Preprocessing**: OHE, frequency encoding, log transforms, standard scaling | |
| ### Threat Categories | |
| | Class | Type | Severity | Examples | | |
| |-------|------|----------|----------| | |
| | `normal` | Clean traffic | β None | HTTP requests, DNS queries | | |
| | `DoS` | Denial of Service | π΄ **Critical** | SYN floods, UDP storms | | |
| | `Probe` | Reconnaissance | π Medium | Port scanning, OS fingerprinting | | |
| | `R2L` | Remote to Local | π΄ High | SSH brute force, FTP attacks | | |
| | `U2R` | User to Root | π΄ **Critical** | Buffer overflow, privilege escalation | | |
| --- | |
| ## β¨ Features | |
| ### π‘ Live Monitor Tab | |
| Real-time threat detection with auto-generated NSL-KDD formatted packets | |
| - **Auto-Generation**: Simulates realistic network traffic packets | |
| - **Real-Time Inference**: Each packet sent to trained model instantly | |
| - **Live Detection Feed**: Class, confidence, severity per packet | |
| - **Attack Distribution Chart**: Bar chart updating in real-time | |
| - **Threat Timeline**: Last 60 seconds of activity | |
| - **Activity Heatmap**: 60Γ8 grid of recent packets | |
| - **Confidence Distribution**: Histogram of model certainty | |
| - **System Log**: Terminal-style event log | |
| - **Session Summary**: Total packets, attacks detected, accuracy metrics | |
| ### π CSV Analysis Tab | |
| Upload and analyze NSL-KDD formatted datasets with streaming predictions | |
| - **Smart Header Detection**: Auto-detects with or without column names | |
| - **Batch Processing**: Optimized row-by-row inference through model | |
| - **Live Progress**: Real-time bar with ETA and processing speed (rows/sec) | |
| - **Streaming Results**: Predictions appear as they're computed | |
| - **Threat Report Generation** (on completion): | |
| - Risk score gauge (0β100) | |
| - Class distribution bar chart | |
| - Confidence waveform over entire dataset | |
| - Threat intensity rolling average | |
| - Protocol breakdown pie chart | |
| - Top targeted services | |
| - Attack pattern clustering visualization | |
| - Paginated full results table with sorting/filtering | |
| - **Multi-Format Export**: CSV, PDF report, JSON | |
| --- | |
| ## π§ ML Pipeline Deep Dive | |
| ### Feature Engineering | |
| ```python | |
| # Input: 41 raw NSL-KDD features | |
| features_raw = { | |
| 'duration', 'protocol_type', 'service', 'flag', | |
| 'src_bytes', 'dst_bytes', 'land', 'wrong_fragment', | |
| 'urgent', 'hot', 'num_failed_logins', 'logged_in', | |
| 'num_compromised', 'root_shell', 'su_attempted', | |
| 'num_root', 'num_file_creations', 'num_shells', | |
| 'num_access_files', 'num_outbound_cmds', 'is_host_login', | |
| 'is_guest_login', 'count', 'srv_count', 'serror_rate', | |
| 'srv_serror_rate', 'rerror_rate', 'srv_rerror_rate', | |
| 'same_srv_rate', 'diff_srv_rate', 'srv_diff_host_rate', | |
| 'dst_host_count', 'dst_host_srv_count', 'dst_host_same_srv_rate', | |
| 'dst_host_diff_srv_rate', 'dst_host_same_src_port_rate', | |
| 'dst_host_srv_diff_host_rate' | |
| } | |
| # Preprocessing Pipeline | |
| 1. One-hot encoding: protocol_type (3 categories) β 3 columns | |
| 2. One-hot encoding: flag (11 categories) β 11 columns | |
| 3. Frequency encoding: service β maps to frequency rank | |
| 4. Log transforms: log(1 + src_bytes), log(1 + dst_bytes), log(1 + duration) | |
| 5. Feature engineering: | |
| - total_bytes = src_bytes + dst_bytes | |
| - src_bytes_ratio = src_bytes / (total_bytes + 1) | |
| - is_error_flag = 1 if error flag present | |
| 6. Standard scaling: (x - mean) / std for all numeric features | |
| # Output: 41 standardized features β Random Forest inference | |
| ``` | |
| ### Serialization | |
| All pipeline artifacts are serialized with `joblib` for production reliability: | |
| ``` | |
| models/ | |
| βββ sentinel_brain.joblib # Trained Random Forest (100 trees) | |
| βββ label_encoder.joblib # Encodes target class labels | |
| βββ ohe_encoder.joblib # One-hot encoder for protocol/flag | |
| βββ freq_map.joblib # Service frequency mapping dictionary | |
| βββ scaler.joblib # StandardScaler fitted on training data | |
| βββ selected_features.joblib # List of 41 selected features in order | |
| ``` | |
| --- | |
| ## π Quick Start | |
| ### Prerequisites | |
| - Python 3.10+ | |
| - pip or conda | |
| - 500MB disk space for models | |
| ### Local Setup (5 minutes) | |
| ```bash | |
| # 1. Clone repository | |
| git clone https://github.com/Hitan547/sentinelnet.git | |
| cd sentinelnet | |
| # 2. Create virtual environment (recommended) | |
| python -m venv venv | |
| source venv/bin/activate # On Windows: venv\Scripts\activate | |
| # 3. Install dependencies | |
| pip install -r requirements.txt | |
| # 4. Run Flask server | |
| python app.py | |
| # 5. Open browser | |
| # β http://localhost:7860 | |
| ``` | |
| ### Docker Setup (for Spaces or cloud deployment) | |
| ```bash | |
| # Build image | |
| docker build -t sentinelnet:latest . | |
| # Run container | |
| docker run -p 7860:7860 sentinelnet:latest | |
| # Access at http://localhost:7860 | |
| ``` | |
| ### Deployment on HuggingFace Spaces | |
| 1. Create new Space on HuggingFace | |
| 2. Select "Docker" runtime | |
| 3. Clone this repo | |
| 4. Push to Space repo | |
| 5. Auto-deploys and serves live | |
| --- | |
| ## π REST API Reference | |
| ### POST `/predict` | |
| Batch inference endpoint for NSL-KDD formatted network packets | |
| **Request:** | |
| ```json | |
| { | |
| "rows": [ | |
| { | |
| "duration": 0, | |
| "protocol_type": "tcp", | |
| "service": "http", | |
| "flag": "SF", | |
| "src_bytes": 181, | |
| "dst_bytes": 5450, | |
| "land": 0, | |
| "wrong_fragment": 0, | |
| "urgent": 0, | |
| "hot": 0, | |
| "num_failed_logins": 0, | |
| "logged_in": 1, | |
| "num_compromised": 0, | |
| "root_shell": 0, | |
| "su_attempted": 0, | |
| "num_root": 0, | |
| "num_file_creations": 0, | |
| "num_shells": 0, | |
| "num_access_files": 0, | |
| "num_outbound_cmds": 0, | |
| "is_host_login": 0, | |
| "is_guest_login": 0, | |
| "count": 1, | |
| "srv_count": 1, | |
| "serror_rate": 0.0, | |
| "srv_serror_rate": 0.0, | |
| "rerror_rate": 0.0, | |
| "srv_rerror_rate": 0.0, | |
| "same_srv_rate": 1.0, | |
| "diff_srv_rate": 0.0, | |
| "srv_diff_host_rate": 0.0, | |
| "dst_host_count": 1, | |
| "dst_host_srv_count": 1, | |
| "dst_host_same_srv_rate": 1.0, | |
| "dst_host_diff_srv_rate": 0.0, | |
| "dst_host_same_src_port_rate": 0.0, | |
| "dst_host_srv_diff_host_rate": 0.0 | |
| } | |
| ] | |
| } | |
| ``` | |
| **Response:** | |
| ```json | |
| { | |
| "status": "ok", | |
| "results": [ | |
| { | |
| "predicted_class": "normal", | |
| "severity": "None", | |
| "confidence": 0.9821, | |
| "is_intrusion": false | |
| } | |
| ] | |
| } | |
| ``` | |
| ### GET `/health` | |
| System health check | |
| **Response:** | |
| ```json | |
| { | |
| "status": "online", | |
| "model": "sentinel_brain", | |
| "version": "1.0.0", | |
| "uptime_seconds": 3600 | |
| } | |
| ``` | |
| --- | |
| ## π Project Structure | |
| ``` | |
| sentinelnet/ | |
| βββ frontend/ | |
| β βββ index.html # Main HTML with tabs, charts, tables | |
| β βββ style.css # CSS variables, grid layout, animations | |
| β βββ app.js # Canvas charts, API calls, event handlers | |
| βββ models/ | |
| β βββ sentinel_brain.joblib # Random Forest classifier | |
| β βββ label_encoder.joblib # Target label encoding | |
| β βββ ohe_encoder.joblib # Protocol/flag one-hot encoder | |
| β βββ freq_map.joblib # Service frequency dictionary | |
| β βββ scaler.joblib # Standard scaler | |
| β βββ selected_features.joblib # 41 feature names + order | |
| βββ app.py # Flask server + /predict + /health endpoints | |
| βββ requirements.txt # Python dependencies (Flask, scikit-learn, etc.) | |
| βββ Dockerfile # Multi-stage build for HuggingFace Spaces | |
| βββ .dockerignore # Excludes unnecessary files from build | |
| βββ .github/ | |
| β βββ workflows/ | |
| β βββ ci.yml # GitHub Actions CI pipeline | |
| βββ README.md # This file | |
| ``` | |
| --- | |
| ## π CI/CD Pipeline | |
| ### Continuous Integration (GitHub Actions) | |
| ```yaml | |
| on: [push, pull_request] | |
| jobs: | |
| build: | |
| runs-on: ubuntu-latest | |
| steps: | |
| - uses: actions/checkout@v3 | |
| - uses: actions/setup-python@v4 | |
| with: | |
| python-version: '3.10' | |
| - name: Install dependencies | |
| run: pip install -r requirements.txt | |
| - name: Syntax check | |
| run: python -m py_compile app.py | |
| - name: Health check (skip models) | |
| env: | |
| SKIP_MODEL: true | |
| run: python app.py & | |
| sleep 2 | |
| curl http://localhost:7860/health | |
| - name: Docker build test | |
| run: docker build -t sentinelnet:test . | |
| ``` | |
| **CI Features:** | |
| - β Python 3.10 environment setup | |
| - β Dependency installation verification | |
| - β Code syntax validation | |
| - β Flask app health check (with `SKIP_MODEL=true` to avoid model loading timeout) | |
| - β Docker image build validation | |
| ### Continuous Deployment (HuggingFace Spaces) | |
| - **Trigger**: Push to `main` branch | |
| - **Action**: Auto-deploys Docker container to HuggingFace Spaces | |
| - **Endpoint**: https://huggingface.co/spaces/Hitan2004/sentinelnet | |
| - **Uptime**: Always available (free tier with occasional cold starts) | |
| --- | |
| ## π What I Learned | |
| β **Production ML Systems** | |
| - Training and deploying multi-class classification models end-to-end | |
| - Feature engineering and preprocessing pipeline serialization | |
| - Model serving via REST API with batch inference | |
| β **Real-Time Dashboards** | |
| - Building interactive dashboards with vanilla JavaScript | |
| - Canvas API for high-performance charting (thousands of data points) | |
| - Responsive design for desktop and tablet | |
| β **Backend Engineering** | |
| - Flask REST API design and CORS handling | |
| - Batch processing with streaming progress feedback | |
| - Error handling and validation | |
| β **DevOps & Deployment** | |
| - Docker containerization for reproducible environments | |
| - HuggingFace Spaces deployment workflow | |
| - GitHub Actions CI/CD pipeline with smart skipping | |
| β **Advanced Concepts** | |
| - NSL-KDD dataset characteristics and threat modeling | |
| - One-hot vs. frequency encoding trade-offs | |
| - Log transforms for skewed feature distributions | |
| - Cross-entropy loss and feature importance in Random Forest | |
| --- | |
| ## π Dataset Reference | |
| **NSL-KDD Dataset** | |
| - Improved version of KDD Cup 1999 | |
| - **Size**: 125,973 training records, 22,544 test records | |
| - **Features**: 41 network connection attributes | |
| - **Classes**: 5 (normal, DoS, Probe, R2L, U2R) | |
| - **Advantages**: Removes duplicate records, more balanced class distribution | |
| - **Standard**: Widely used benchmark for IDS research | |
| **Attribute Categories:** | |
| - Basic features (10): duration, protocol, service, flag, bytes | |
| - Content features (13): hot, num_failed_logins, logged_in, compromised, etc. | |
| - Time-based traffic features (9): count, srv_count, serror_rate, etc. | |
| - Host-based traffic features (9): dst_host_count, dst_host_srv_count, etc. | |
| --- | |
| ## π€ Contributing | |
| This is a portfolio project, but you're welcome to fork and extend! | |
| **Ideas for enhancement:** | |
| - [ ] Add LSTM-based temporal anomaly detection | |
| - [ ] Implement feature importance visualization | |
| - [ ] Add real PCAP file ingestion | |
| - [ ] Multi-model ensemble (XGBoost + Neural Network) | |
| - [ ] Real-time alerting webhook integration | |
| --- | |
| ## π License | |
| MIT License β Use freely for learning, portfolio, or production purposes. | |
| --- | |
| ## π Contact | |
| **Hitan K** β AI Systems Engineer | |
| - π [LinkedIn](https://linkedin.com/in/hitan-k) | |
| - π [GitHub](https://github.com/Hitan547) | |
| - π€ [HuggingFace](https://huggingface.co/Hitan2004) | |
| - π§ [Email](mailto:hitan.k@outlook.com) | |
| --- | |
| <div align="center"> | |
| **β If this helped you, please star the repo! β** | |
| *Built with β€οΈ for production and learning.* | |
| </div> | |