sentinelnet / README.md
3v324v23's picture
Auto deploy from GitHub Actions
6baf846

πŸ›‘οΈ SentinelNet β€” AI-Powered Network Intrusion Detection System

Production ML system detecting 5 categories of network threats in real-time

Live Demo GitHub Python scikit-learn

A full-stack real-time intrusion detection dashboard with hybrid frontend, REST API, and automated CI/CD deployment.


🎯 Overview

SentinelNet is a production-grade network intrusion detection system that analyzes live traffic and batch CSV datasets to classify connections into 5 threat categories. Built with a Random Forest classifier trained on the NSL-KDD dataset, it combines real-time inference with a sophisticated web dashboard and self-correcting batch processing.

⚑ Key Capabilities

Feature Capability
Real-Time Detection 1000s of live packets/sec through trained ML model
Threat Classification 5-class detection: normal, DoS, Probe, R2L, U2R
Batch Analysis Process CSVs with live progress, streaming predictions, auto-generated threat reports
Visual Intelligence Live timeline, activity heatmaps, confidence distributions, attack patterns
Export Formats CSV, PDF reports, JSON for integration
Deployment Docker containerized, live on HuggingFace Spaces

πŸ—οΈ Architecture

System Diagram

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   SentinelNet System                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚   Flask Backend  β”‚
                    β”‚   (app.py)       β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚                   β”‚                   β”‚
    β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
    β”‚ /health  β”‚         β”‚/predict β”‚       β”‚ /static    β”‚
    β”‚ Endpoint β”‚         β”‚ Batch   β”‚       β”‚ Frontend   β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β”‚ Inferenceβ”‚      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
                              β”‚
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚               β”‚               β”‚
         β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚ML Pipelineβ”‚   β”‚One-Hot    β”‚   β”‚Label         β”‚
         β”‚Processing β”‚   β”‚Encoder    β”‚   β”‚Encoder       β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              β”‚
         β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚ Random Forest Classifier  β”‚
         β”‚ (sentinel_brain.joblib)   β”‚
         β”‚ 41 NSL-KDD Features       β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Data Flow

User Input (Live or CSV)
    ↓
Feature Extraction & Validation
    ↓
One-Hot Encoding (protocol_type, flag)
    ↓
Frequency Encoding (service)
    ↓
Log Transforms (src_bytes, dst_bytes, duration)
    ↓
Feature Engineering (total_bytes, ratios, error flags)
    ↓
Standard Scaling (all features)
    ↓
Random Forest Inference
    ↓
Prediction + Confidence Score
    ↓
Severity Mapping
    ↓
JSON Response / Dashboard Update

πŸ“Š Model Performance

Training Details

  • Algorithm: Random Forest Classifier (100 trees)
  • Dataset: NSL-KDD (improved KDD Cup 1999)
  • Features: 41 network connection attributes
  • Classes: 5 (normal, DoS, Probe, R2L, U2R)
  • Preprocessing: OHE, frequency encoding, log transforms, standard scaling

Threat Categories

Class Type Severity Examples
normal Clean traffic βœ… None HTTP requests, DNS queries
DoS Denial of Service πŸ”΄ Critical SYN floods, UDP storms
Probe Reconnaissance 🟠 Medium Port scanning, OS fingerprinting
R2L Remote to Local πŸ”΄ High SSH brute force, FTP attacks
U2R User to Root πŸ”΄ Critical Buffer overflow, privilege escalation

✨ Features

πŸ“‘ Live Monitor Tab

Real-time threat detection with auto-generated NSL-KDD formatted packets

  • Auto-Generation: Simulates realistic network traffic packets
  • Real-Time Inference: Each packet sent to trained model instantly
  • Live Detection Feed: Class, confidence, severity per packet
  • Attack Distribution Chart: Bar chart updating in real-time
  • Threat Timeline: Last 60 seconds of activity
  • Activity Heatmap: 60Γ—8 grid of recent packets
  • Confidence Distribution: Histogram of model certainty
  • System Log: Terminal-style event log
  • Session Summary: Total packets, attacks detected, accuracy metrics

πŸ“‚ CSV Analysis Tab

Upload and analyze NSL-KDD formatted datasets with streaming predictions

  • Smart Header Detection: Auto-detects with or without column names
  • Batch Processing: Optimized row-by-row inference through model
  • Live Progress: Real-time bar with ETA and processing speed (rows/sec)
  • Streaming Results: Predictions appear as they're computed
  • Threat Report Generation (on completion):
    • Risk score gauge (0–100)
    • Class distribution bar chart
    • Confidence waveform over entire dataset
    • Threat intensity rolling average
    • Protocol breakdown pie chart
    • Top targeted services
    • Attack pattern clustering visualization
    • Paginated full results table with sorting/filtering
  • Multi-Format Export: CSV, PDF report, JSON

🧠 ML Pipeline Deep Dive

Feature Engineering

# Input: 41 raw NSL-KDD features
features_raw = {
    'duration', 'protocol_type', 'service', 'flag',
    'src_bytes', 'dst_bytes', 'land', 'wrong_fragment',
    'urgent', 'hot', 'num_failed_logins', 'logged_in',
    'num_compromised', 'root_shell', 'su_attempted',
    'num_root', 'num_file_creations', 'num_shells',
    'num_access_files', 'num_outbound_cmds', 'is_host_login',
    'is_guest_login', 'count', 'srv_count', 'serror_rate',
    'srv_serror_rate', 'rerror_rate', 'srv_rerror_rate',
    'same_srv_rate', 'diff_srv_rate', 'srv_diff_host_rate',
    'dst_host_count', 'dst_host_srv_count', 'dst_host_same_srv_rate',
    'dst_host_diff_srv_rate', 'dst_host_same_src_port_rate',
    'dst_host_srv_diff_host_rate'
}

# Preprocessing Pipeline
1. One-hot encoding: protocol_type (3 categories) β†’ 3 columns
2. One-hot encoding: flag (11 categories) β†’ 11 columns
3. Frequency encoding: service β†’ maps to frequency rank
4. Log transforms: log(1 + src_bytes), log(1 + dst_bytes), log(1 + duration)
5. Feature engineering:
   - total_bytes = src_bytes + dst_bytes
   - src_bytes_ratio = src_bytes / (total_bytes + 1)
   - is_error_flag = 1 if error flag present
6. Standard scaling: (x - mean) / std for all numeric features

# Output: 41 standardized features β†’ Random Forest inference

Serialization

All pipeline artifacts are serialized with joblib for production reliability:

models/
β”œβ”€β”€ sentinel_brain.joblib       # Trained Random Forest (100 trees)
β”œβ”€β”€ label_encoder.joblib        # Encodes target class labels
β”œβ”€β”€ ohe_encoder.joblib          # One-hot encoder for protocol/flag
β”œβ”€β”€ freq_map.joblib             # Service frequency mapping dictionary
β”œβ”€β”€ scaler.joblib               # StandardScaler fitted on training data
└── selected_features.joblib    # List of 41 selected features in order

πŸš€ Quick Start

Prerequisites

  • Python 3.10+
  • pip or conda
  • 500MB disk space for models

Local Setup (5 minutes)

# 1. Clone repository
git clone https://github.com/Hitan547/sentinelnet.git
cd sentinelnet

# 2. Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# 3. Install dependencies
pip install -r requirements.txt

# 4. Run Flask server
python app.py

# 5. Open browser
# β†’ http://localhost:7860

Docker Setup (for Spaces or cloud deployment)

# Build image
docker build -t sentinelnet:latest .

# Run container
docker run -p 7860:7860 sentinelnet:latest

# Access at http://localhost:7860

Deployment on HuggingFace Spaces

  1. Create new Space on HuggingFace
  2. Select "Docker" runtime
  3. Clone this repo
  4. Push to Space repo
  5. Auto-deploys and serves live

πŸ”Œ REST API Reference

POST /predict

Batch inference endpoint for NSL-KDD formatted network packets

Request:

{
  "rows": [
    {
      "duration": 0,
      "protocol_type": "tcp",
      "service": "http",
      "flag": "SF",
      "src_bytes": 181,
      "dst_bytes": 5450,
      "land": 0,
      "wrong_fragment": 0,
      "urgent": 0,
      "hot": 0,
      "num_failed_logins": 0,
      "logged_in": 1,
      "num_compromised": 0,
      "root_shell": 0,
      "su_attempted": 0,
      "num_root": 0,
      "num_file_creations": 0,
      "num_shells": 0,
      "num_access_files": 0,
      "num_outbound_cmds": 0,
      "is_host_login": 0,
      "is_guest_login": 0,
      "count": 1,
      "srv_count": 1,
      "serror_rate": 0.0,
      "srv_serror_rate": 0.0,
      "rerror_rate": 0.0,
      "srv_rerror_rate": 0.0,
      "same_srv_rate": 1.0,
      "diff_srv_rate": 0.0,
      "srv_diff_host_rate": 0.0,
      "dst_host_count": 1,
      "dst_host_srv_count": 1,
      "dst_host_same_srv_rate": 1.0,
      "dst_host_diff_srv_rate": 0.0,
      "dst_host_same_src_port_rate": 0.0,
      "dst_host_srv_diff_host_rate": 0.0
    }
  ]
}

Response:

{
  "status": "ok",
  "results": [
    {
      "predicted_class": "normal",
      "severity": "None",
      "confidence": 0.9821,
      "is_intrusion": false
    }
  ]
}

GET /health

System health check

Response:

{
  "status": "online",
  "model": "sentinel_brain",
  "version": "1.0.0",
  "uptime_seconds": 3600
}

πŸ“ Project Structure

sentinelnet/
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ index.html          # Main HTML with tabs, charts, tables
β”‚   β”œβ”€β”€ style.css           # CSS variables, grid layout, animations
β”‚   └── app.js              # Canvas charts, API calls, event handlers
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ sentinel_brain.joblib          # Random Forest classifier
β”‚   β”œβ”€β”€ label_encoder.joblib           # Target label encoding
β”‚   β”œβ”€β”€ ohe_encoder.joblib             # Protocol/flag one-hot encoder
β”‚   β”œβ”€β”€ freq_map.joblib                # Service frequency dictionary
β”‚   β”œβ”€β”€ scaler.joblib                  # Standard scaler
β”‚   └── selected_features.joblib       # 41 feature names + order
β”œβ”€β”€ app.py                 # Flask server + /predict + /health endpoints
β”œβ”€β”€ requirements.txt       # Python dependencies (Flask, scikit-learn, etc.)
β”œβ”€β”€ Dockerfile            # Multi-stage build for HuggingFace Spaces
β”œβ”€β”€ .dockerignore         # Excludes unnecessary files from build
β”œβ”€β”€ .github/
β”‚   └── workflows/
β”‚       └── ci.yml        # GitHub Actions CI pipeline
└── README.md             # This file

πŸ”„ CI/CD Pipeline

Continuous Integration (GitHub Actions)

on: [push, pull_request]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-python@v4
        with:
          python-version: '3.10'
      - name: Install dependencies
        run: pip install -r requirements.txt
      - name: Syntax check
        run: python -m py_compile app.py
      - name: Health check (skip models)
        env:
          SKIP_MODEL: true
        run: python app.py &
             sleep 2
             curl http://localhost:7860/health
      - name: Docker build test
        run: docker build -t sentinelnet:test .

CI Features:

  • βœ… Python 3.10 environment setup
  • βœ… Dependency installation verification
  • βœ… Code syntax validation
  • βœ… Flask app health check (with SKIP_MODEL=true to avoid model loading timeout)
  • βœ… Docker image build validation

Continuous Deployment (HuggingFace Spaces)


πŸŽ“ What I Learned

βœ… Production ML Systems

  • Training and deploying multi-class classification models end-to-end
  • Feature engineering and preprocessing pipeline serialization
  • Model serving via REST API with batch inference

βœ… Real-Time Dashboards

  • Building interactive dashboards with vanilla JavaScript
  • Canvas API for high-performance charting (thousands of data points)
  • Responsive design for desktop and tablet

βœ… Backend Engineering

  • Flask REST API design and CORS handling
  • Batch processing with streaming progress feedback
  • Error handling and validation

βœ… DevOps & Deployment

  • Docker containerization for reproducible environments
  • HuggingFace Spaces deployment workflow
  • GitHub Actions CI/CD pipeline with smart skipping

βœ… Advanced Concepts

  • NSL-KDD dataset characteristics and threat modeling
  • One-hot vs. frequency encoding trade-offs
  • Log transforms for skewed feature distributions
  • Cross-entropy loss and feature importance in Random Forest

πŸ“Š Dataset Reference

NSL-KDD Dataset

  • Improved version of KDD Cup 1999
  • Size: 125,973 training records, 22,544 test records
  • Features: 41 network connection attributes
  • Classes: 5 (normal, DoS, Probe, R2L, U2R)
  • Advantages: Removes duplicate records, more balanced class distribution
  • Standard: Widely used benchmark for IDS research

Attribute Categories:

  • Basic features (10): duration, protocol, service, flag, bytes
  • Content features (13): hot, num_failed_logins, logged_in, compromised, etc.
  • Time-based traffic features (9): count, srv_count, serror_rate, etc.
  • Host-based traffic features (9): dst_host_count, dst_host_srv_count, etc.

🀝 Contributing

This is a portfolio project, but you're welcome to fork and extend!

Ideas for enhancement:

  • Add LSTM-based temporal anomaly detection
  • Implement feature importance visualization
  • Add real PCAP file ingestion
  • Multi-model ensemble (XGBoost + Neural Network)
  • Real-time alerting webhook integration

πŸ“œ License

MIT License β€” Use freely for learning, portfolio, or production purposes.


πŸ“ž Contact

Hitan K β€” AI Systems Engineer


⭐ If this helped you, please star the repo! ⭐

Built with ❀️ for production and learning.