Spaces:

Hitan2004
/

sentinelnet

Configuration error

App Files Files Community

sentinelnet / README.md

3v324v23

Auto deploy from GitHub Actions

6baf846 5 days ago

preview code

raw

history blame contribute delete

16.3 kB

🛡️ SentinelNet — AI-Powered Network Intrusion Detection System

Production ML system detecting 5 categories of network threats in real-time

A full-stack real-time intrusion detection dashboard with hybrid frontend, REST API, and automated CI/CD deployment.

🎯 Overview

SentinelNet is a production-grade network intrusion detection system that analyzes live traffic and batch CSV datasets to classify connections into 5 threat categories. Built with a Random Forest classifier trained on the NSL-KDD dataset, it combines real-time inference with a sophisticated web dashboard and self-correcting batch processing.

⚡ Key Capabilities

Feature	Capability
Real-Time Detection	1000s of live packets/sec through trained ML model
Threat Classification	5-class detection: normal, DoS, Probe, R2L, U2R
Batch Analysis	Process CSVs with live progress, streaming predictions, auto-generated threat reports
Visual Intelligence	Live timeline, activity heatmaps, confidence distributions, attack patterns
Export Formats	CSV, PDF reports, JSON for integration
Deployment	Docker containerized, live on HuggingFace Spaces

🏗️ Architecture

System Diagram

┌─────────────────────────────────────────────────────────┐
│                   SentinelNet System                     │
└─────────────────────────────────────────────────────────┘

                    ┌──────────────────┐
                    │   Flask Backend  │
                    │   (app.py)       │
                    └────────┬─────────┘
                             │
         ┌───────────────────┼───────────────────┐
         │                   │                   │
    ┌────▼────┐         ┌────▼────┐       ┌─────▼──────┐
    │ /health  │         │/predict │       │ /static    │
    │ Endpoint │         │ Batch   │       │ Frontend   │
    └──────────┘         │ Inference│      └────────────┘
                         └────┬────┘
                              │
              ┌───────────────┼───────────────┐
              │               │               │
         ┌────▼──────┐   ┌────▼─────┐   ┌───▼──────────┐
         │ML Pipeline│   │One-Hot    │   │Label         │
         │Processing │   │Encoder    │   │Encoder       │
         └───────────┘   └───────────┘   └──────────────┘
              │
         ┌────▼──────────────────────┐
         │ Random Forest Classifier  │
         │ (sentinel_brain.joblib)   │
         │ 41 NSL-KDD Features       │
         └───────────────────────────┘

Data Flow

User Input (Live or CSV)
    ↓
Feature Extraction & Validation
    ↓
One-Hot Encoding (protocol_type, flag)
    ↓
Frequency Encoding (service)
    ↓
Log Transforms (src_bytes, dst_bytes, duration)
    ↓
Feature Engineering (total_bytes, ratios, error flags)
    ↓
Standard Scaling (all features)
    ↓
Random Forest Inference
    ↓
Prediction + Confidence Score
    ↓
Severity Mapping
    ↓
JSON Response / Dashboard Update

📊 Model Performance

Training Details

Algorithm: Random Forest Classifier (100 trees)
Dataset: NSL-KDD (improved KDD Cup 1999)
Features: 41 network connection attributes
Classes: 5 (normal, DoS, Probe, R2L, U2R)
Preprocessing: OHE, frequency encoding, log transforms, standard scaling

Threat Categories

Class	Type	Severity	Examples
`normal`	Clean traffic	✅ None	HTTP requests, DNS queries
`DoS`	Denial of Service	🔴 Critical	SYN floods, UDP storms
`Probe`	Reconnaissance	🟠 Medium	Port scanning, OS fingerprinting
`R2L`	Remote to Local	🔴 High	SSH brute force, FTP attacks
`U2R`	User to Root	🔴 Critical	Buffer overflow, privilege escalation

✨ Features

📡 Live Monitor Tab

Real-time threat detection with auto-generated NSL-KDD formatted packets

Auto-Generation: Simulates realistic network traffic packets
Real-Time Inference: Each packet sent to trained model instantly
Live Detection Feed: Class, confidence, severity per packet
Attack Distribution Chart: Bar chart updating in real-time
Threat Timeline: Last 60 seconds of activity
Activity Heatmap: 60×8 grid of recent packets
Confidence Distribution: Histogram of model certainty
System Log: Terminal-style event log
Session Summary: Total packets, attacks detected, accuracy metrics

📂 CSV Analysis Tab

Upload and analyze NSL-KDD formatted datasets with streaming predictions

Smart Header Detection: Auto-detects with or without column names
Batch Processing: Optimized row-by-row inference through model
Live Progress: Real-time bar with ETA and processing speed (rows/sec)
Streaming Results: Predictions appear as they're computed
Threat Report Generation (on completion):
- Risk score gauge (0–100)
- Class distribution bar chart
- Confidence waveform over entire dataset
- Threat intensity rolling average
- Protocol breakdown pie chart
- Top targeted services
- Attack pattern clustering visualization
- Paginated full results table with sorting/filtering
Multi-Format Export: CSV, PDF report, JSON

🧠 ML Pipeline Deep Dive

Feature Engineering

# Input: 41 raw NSL-KDD features
features_raw = {
    'duration', 'protocol_type', 'service', 'flag',
    'src_bytes', 'dst_bytes', 'land', 'wrong_fragment',
    'urgent', 'hot', 'num_failed_logins', 'logged_in',
    'num_compromised', 'root_shell', 'su_attempted',
    'num_root', 'num_file_creations', 'num_shells',
    'num_access_files', 'num_outbound_cmds', 'is_host_login',
    'is_guest_login', 'count', 'srv_count', 'serror_rate',
    'srv_serror_rate', 'rerror_rate', 'srv_rerror_rate',
    'same_srv_rate', 'diff_srv_rate', 'srv_diff_host_rate',
    'dst_host_count', 'dst_host_srv_count', 'dst_host_same_srv_rate',
    'dst_host_diff_srv_rate', 'dst_host_same_src_port_rate',
    'dst_host_srv_diff_host_rate'
}

# Preprocessing Pipeline
1. One-hot encoding: protocol_type (3 categories) → 3 columns
2. One-hot encoding: flag (11 categories) → 11 columns
3. Frequency encoding: service → maps to frequency rank
4. Log transforms: log(1 + src_bytes), log(1 + dst_bytes), log(1 + duration)
5. Feature engineering:
   - total_bytes = src_bytes + dst_bytes
   - src_bytes_ratio = src_bytes / (total_bytes + 1)
   - is_error_flag = 1 if error flag present
6. Standard scaling: (x - mean) / std for all numeric features

# Output: 41 standardized features → Random Forest inference

Serialization

All pipeline artifacts are serialized with joblib for production reliability:

models/
├── sentinel_brain.joblib       # Trained Random Forest (100 trees)
├── label_encoder.joblib        # Encodes target class labels
├── ohe_encoder.joblib          # One-hot encoder for protocol/flag
├── freq_map.joblib             # Service frequency mapping dictionary
├── scaler.joblib               # StandardScaler fitted on training data
└── selected_features.joblib    # List of 41 selected features in order

🚀 Quick Start

Prerequisites

Python 3.10+
pip or conda
500MB disk space for models

Local Setup (5 minutes)

# 1. Clone repository
git clone https://github.com/Hitan547/sentinelnet.git
cd sentinelnet

# 2. Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# 3. Install dependencies
pip install -r requirements.txt

# 4. Run Flask server
python app.py

# 5. Open browser
# → http://localhost:7860

Docker Setup (for Spaces or cloud deployment)

# Build image
docker build -t sentinelnet:latest .

# Run container
docker run -p 7860:7860 sentinelnet:latest

# Access at http://localhost:7860

Deployment on HuggingFace Spaces

Create new Space on HuggingFace
Select "Docker" runtime
Clone this repo
Push to Space repo
Auto-deploys and serves live

🔌 REST API Reference

POST `/predict`

Batch inference endpoint for NSL-KDD formatted network packets

Request:

{
  "rows": [
    {
      "duration": 0,
      "protocol_type": "tcp",
      "service": "http",
      "flag": "SF",
      "src_bytes": 181,
      "dst_bytes": 5450,
      "land": 0,
      "wrong_fragment": 0,
      "urgent": 0,
      "hot": 0,
      "num_failed_logins": 0,
      "logged_in": 1,
      "num_compromised": 0,
      "root_shell": 0,
      "su_attempted": 0,
      "num_root": 0,
      "num_file_creations": 0,
      "num_shells": 0,
      "num_access_files": 0,
      "num_outbound_cmds": 0,
      "is_host_login": 0,
      "is_guest_login": 0,
      "count": 1,
      "srv_count": 1,
      "serror_rate": 0.0,
      "srv_serror_rate": 0.0,
      "rerror_rate": 0.0,
      "srv_rerror_rate": 0.0,
      "same_srv_rate": 1.0,
      "diff_srv_rate": 0.0,
      "srv_diff_host_rate": 0.0,
      "dst_host_count": 1,
      "dst_host_srv_count": 1,
      "dst_host_same_srv_rate": 1.0,
      "dst_host_diff_srv_rate": 0.0,
      "dst_host_same_src_port_rate": 0.0,
      "dst_host_srv_diff_host_rate": 0.0
    }
  ]
}

Response:

{
  "status": "ok",
  "results": [
    {
      "predicted_class": "normal",
      "severity": "None",
      "confidence": 0.9821,
      "is_intrusion": false
    }
  ]
}

GET `/health`

System health check

Response:

{
  "status": "online",
  "model": "sentinel_brain",
  "version": "1.0.0",
  "uptime_seconds": 3600
}

📁 Project Structure

sentinelnet/
├── frontend/
│   ├── index.html          # Main HTML with tabs, charts, tables
│   ├── style.css           # CSS variables, grid layout, animations
│   └── app.js              # Canvas charts, API calls, event handlers
├── models/
│   ├── sentinel_brain.joblib          # Random Forest classifier
│   ├── label_encoder.joblib           # Target label encoding
│   ├── ohe_encoder.joblib             # Protocol/flag one-hot encoder
│   ├── freq_map.joblib                # Service frequency dictionary
│   ├── scaler.joblib                  # Standard scaler
│   └── selected_features.joblib       # 41 feature names + order
├── app.py                 # Flask server + /predict + /health endpoints
├── requirements.txt       # Python dependencies (Flask, scikit-learn, etc.)
├── Dockerfile            # Multi-stage build for HuggingFace Spaces
├── .dockerignore         # Excludes unnecessary files from build
├── .github/
│   └── workflows/
│       └── ci.yml        # GitHub Actions CI pipeline
└── README.md             # This file

🔄 CI/CD Pipeline

Continuous Integration (GitHub Actions)

on: [push, pull_request]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-python@v4
        with:
          python-version: '3.10'
      - name: Install dependencies
        run: pip install -r requirements.txt
      - name: Syntax check
        run: python -m py_compile app.py
      - name: Health check (skip models)
        env:
          SKIP_MODEL: true
        run: python app.py &
             sleep 2
             curl http://localhost:7860/health
      - name: Docker build test
        run: docker build -t sentinelnet:test .

CI Features:

✅ Python 3.10 environment setup
✅ Dependency installation verification
✅ Code syntax validation
✅ Flask app health check (with SKIP_MODEL=true to avoid model loading timeout)
✅ Docker image build validation

Continuous Deployment (HuggingFace Spaces)

Trigger: Push to main branch
Action: Auto-deploys Docker container to HuggingFace Spaces
Endpoint: https://huggingface.co/spaces/Hitan2004/sentinelnet
Uptime: Always available (free tier with occasional cold starts)

🎓 What I Learned

✅ Production ML Systems

Training and deploying multi-class classification models end-to-end
Feature engineering and preprocessing pipeline serialization
Model serving via REST API with batch inference

✅ Real-Time Dashboards

Building interactive dashboards with vanilla JavaScript
Canvas API for high-performance charting (thousands of data points)
Responsive design for desktop and tablet

✅ Backend Engineering

Flask REST API design and CORS handling
Batch processing with streaming progress feedback
Error handling and validation

✅ DevOps & Deployment

Docker containerization for reproducible environments
HuggingFace Spaces deployment workflow
GitHub Actions CI/CD pipeline with smart skipping

✅ Advanced Concepts

NSL-KDD dataset characteristics and threat modeling
One-hot vs. frequency encoding trade-offs
Log transforms for skewed feature distributions
Cross-entropy loss and feature importance in Random Forest

📊 Dataset Reference

NSL-KDD Dataset

Improved version of KDD Cup 1999
Size: 125,973 training records, 22,544 test records
Features: 41 network connection attributes
Classes: 5 (normal, DoS, Probe, R2L, U2R)
Advantages: Removes duplicate records, more balanced class distribution
Standard: Widely used benchmark for IDS research

Attribute Categories:

Basic features (10): duration, protocol, service, flag, bytes
Content features (13): hot, num_failed_logins, logged_in, compromised, etc.
Time-based traffic features (9): count, srv_count, serror_rate, etc.
Host-based traffic features (9): dst_host_count, dst_host_srv_count, etc.

🤝 Contributing

This is a portfolio project, but you're welcome to fork and extend!

Ideas for enhancement:

Add LSTM-based temporal anomaly detection
Implement feature importance visualization
Add real PCAP file ingestion
Multi-model ensemble (XGBoost + Neural Network)
Real-time alerting webhook integration

📜 License

MIT License — Use freely for learning, portfolio, or production purposes.

📞 Contact

Hitan K — AI Systems Engineer

🔗 LinkedIn
🐙 GitHub
🤗 HuggingFace
📧 Email

⭐ If this helped you, please star the repo! ⭐

Built with ❤️ for production and learning.