Spaces:
Configuration error
π‘οΈ SentinelNet β AI-Powered Network Intrusion Detection System
Production ML system detecting 5 categories of network threats in real-time
A full-stack real-time intrusion detection dashboard with hybrid frontend, REST API, and automated CI/CD deployment.
π― Overview
SentinelNet is a production-grade network intrusion detection system that analyzes live traffic and batch CSV datasets to classify connections into 5 threat categories. Built with a Random Forest classifier trained on the NSL-KDD dataset, it combines real-time inference with a sophisticated web dashboard and self-correcting batch processing.
β‘ Key Capabilities
| Feature | Capability |
|---|---|
| Real-Time Detection | 1000s of live packets/sec through trained ML model |
| Threat Classification | 5-class detection: normal, DoS, Probe, R2L, U2R |
| Batch Analysis | Process CSVs with live progress, streaming predictions, auto-generated threat reports |
| Visual Intelligence | Live timeline, activity heatmaps, confidence distributions, attack patterns |
| Export Formats | CSV, PDF reports, JSON for integration |
| Deployment | Docker containerized, live on HuggingFace Spaces |
ποΈ Architecture
System Diagram
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SentinelNet System β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββ
β Flask Backend β
β (app.py) β
ββββββββββ¬ββββββββββ
β
βββββββββββββββββββββΌββββββββββββββββββββ
β β β
ββββββΌβββββ ββββββΌβββββ βββββββΌβββββββ
β /health β β/predict β β /static β
β Endpoint β β Batch β β Frontend β
ββββββββββββ β Inferenceβ ββββββββββββββ
ββββββ¬βββββ
β
βββββββββββββββββΌββββββββββββββββ
β β β
ββββββΌβββββββ ββββββΌββββββ βββββΌβββββββββββ
βML Pipelineβ βOne-Hot β βLabel β
βProcessing β βEncoder β βEncoder β
βββββββββββββ βββββββββββββ ββββββββββββββββ
β
ββββββΌβββββββββββββββββββββββ
β Random Forest Classifier β
β (sentinel_brain.joblib) β
β 41 NSL-KDD Features β
βββββββββββββββββββββββββββββ
Data Flow
User Input (Live or CSV)
β
Feature Extraction & Validation
β
One-Hot Encoding (protocol_type, flag)
β
Frequency Encoding (service)
β
Log Transforms (src_bytes, dst_bytes, duration)
β
Feature Engineering (total_bytes, ratios, error flags)
β
Standard Scaling (all features)
β
Random Forest Inference
β
Prediction + Confidence Score
β
Severity Mapping
β
JSON Response / Dashboard Update
π Model Performance
Training Details
- Algorithm: Random Forest Classifier (100 trees)
- Dataset: NSL-KDD (improved KDD Cup 1999)
- Features: 41 network connection attributes
- Classes: 5 (normal, DoS, Probe, R2L, U2R)
- Preprocessing: OHE, frequency encoding, log transforms, standard scaling
Threat Categories
| Class | Type | Severity | Examples |
|---|---|---|---|
normal |
Clean traffic | β None | HTTP requests, DNS queries |
DoS |
Denial of Service | π΄ Critical | SYN floods, UDP storms |
Probe |
Reconnaissance | π Medium | Port scanning, OS fingerprinting |
R2L |
Remote to Local | π΄ High | SSH brute force, FTP attacks |
U2R |
User to Root | π΄ Critical | Buffer overflow, privilege escalation |
β¨ Features
π‘ Live Monitor Tab
Real-time threat detection with auto-generated NSL-KDD formatted packets
- Auto-Generation: Simulates realistic network traffic packets
- Real-Time Inference: Each packet sent to trained model instantly
- Live Detection Feed: Class, confidence, severity per packet
- Attack Distribution Chart: Bar chart updating in real-time
- Threat Timeline: Last 60 seconds of activity
- Activity Heatmap: 60Γ8 grid of recent packets
- Confidence Distribution: Histogram of model certainty
- System Log: Terminal-style event log
- Session Summary: Total packets, attacks detected, accuracy metrics
π CSV Analysis Tab
Upload and analyze NSL-KDD formatted datasets with streaming predictions
- Smart Header Detection: Auto-detects with or without column names
- Batch Processing: Optimized row-by-row inference through model
- Live Progress: Real-time bar with ETA and processing speed (rows/sec)
- Streaming Results: Predictions appear as they're computed
- Threat Report Generation (on completion):
- Risk score gauge (0β100)
- Class distribution bar chart
- Confidence waveform over entire dataset
- Threat intensity rolling average
- Protocol breakdown pie chart
- Top targeted services
- Attack pattern clustering visualization
- Paginated full results table with sorting/filtering
- Multi-Format Export: CSV, PDF report, JSON
π§ ML Pipeline Deep Dive
Feature Engineering
# Input: 41 raw NSL-KDD features
features_raw = {
'duration', 'protocol_type', 'service', 'flag',
'src_bytes', 'dst_bytes', 'land', 'wrong_fragment',
'urgent', 'hot', 'num_failed_logins', 'logged_in',
'num_compromised', 'root_shell', 'su_attempted',
'num_root', 'num_file_creations', 'num_shells',
'num_access_files', 'num_outbound_cmds', 'is_host_login',
'is_guest_login', 'count', 'srv_count', 'serror_rate',
'srv_serror_rate', 'rerror_rate', 'srv_rerror_rate',
'same_srv_rate', 'diff_srv_rate', 'srv_diff_host_rate',
'dst_host_count', 'dst_host_srv_count', 'dst_host_same_srv_rate',
'dst_host_diff_srv_rate', 'dst_host_same_src_port_rate',
'dst_host_srv_diff_host_rate'
}
# Preprocessing Pipeline
1. One-hot encoding: protocol_type (3 categories) β 3 columns
2. One-hot encoding: flag (11 categories) β 11 columns
3. Frequency encoding: service β maps to frequency rank
4. Log transforms: log(1 + src_bytes), log(1 + dst_bytes), log(1 + duration)
5. Feature engineering:
- total_bytes = src_bytes + dst_bytes
- src_bytes_ratio = src_bytes / (total_bytes + 1)
- is_error_flag = 1 if error flag present
6. Standard scaling: (x - mean) / std for all numeric features
# Output: 41 standardized features β Random Forest inference
Serialization
All pipeline artifacts are serialized with joblib for production reliability:
models/
βββ sentinel_brain.joblib # Trained Random Forest (100 trees)
βββ label_encoder.joblib # Encodes target class labels
βββ ohe_encoder.joblib # One-hot encoder for protocol/flag
βββ freq_map.joblib # Service frequency mapping dictionary
βββ scaler.joblib # StandardScaler fitted on training data
βββ selected_features.joblib # List of 41 selected features in order
π Quick Start
Prerequisites
- Python 3.10+
- pip or conda
- 500MB disk space for models
Local Setup (5 minutes)
# 1. Clone repository
git clone https://github.com/Hitan547/sentinelnet.git
cd sentinelnet
# 2. Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Run Flask server
python app.py
# 5. Open browser
# β http://localhost:7860
Docker Setup (for Spaces or cloud deployment)
# Build image
docker build -t sentinelnet:latest .
# Run container
docker run -p 7860:7860 sentinelnet:latest
# Access at http://localhost:7860
Deployment on HuggingFace Spaces
- Create new Space on HuggingFace
- Select "Docker" runtime
- Clone this repo
- Push to Space repo
- Auto-deploys and serves live
π REST API Reference
POST /predict
Batch inference endpoint for NSL-KDD formatted network packets
Request:
{
"rows": [
{
"duration": 0,
"protocol_type": "tcp",
"service": "http",
"flag": "SF",
"src_bytes": 181,
"dst_bytes": 5450,
"land": 0,
"wrong_fragment": 0,
"urgent": 0,
"hot": 0,
"num_failed_logins": 0,
"logged_in": 1,
"num_compromised": 0,
"root_shell": 0,
"su_attempted": 0,
"num_root": 0,
"num_file_creations": 0,
"num_shells": 0,
"num_access_files": 0,
"num_outbound_cmds": 0,
"is_host_login": 0,
"is_guest_login": 0,
"count": 1,
"srv_count": 1,
"serror_rate": 0.0,
"srv_serror_rate": 0.0,
"rerror_rate": 0.0,
"srv_rerror_rate": 0.0,
"same_srv_rate": 1.0,
"diff_srv_rate": 0.0,
"srv_diff_host_rate": 0.0,
"dst_host_count": 1,
"dst_host_srv_count": 1,
"dst_host_same_srv_rate": 1.0,
"dst_host_diff_srv_rate": 0.0,
"dst_host_same_src_port_rate": 0.0,
"dst_host_srv_diff_host_rate": 0.0
}
]
}
Response:
{
"status": "ok",
"results": [
{
"predicted_class": "normal",
"severity": "None",
"confidence": 0.9821,
"is_intrusion": false
}
]
}
GET /health
System health check
Response:
{
"status": "online",
"model": "sentinel_brain",
"version": "1.0.0",
"uptime_seconds": 3600
}
π Project Structure
sentinelnet/
βββ frontend/
β βββ index.html # Main HTML with tabs, charts, tables
β βββ style.css # CSS variables, grid layout, animations
β βββ app.js # Canvas charts, API calls, event handlers
βββ models/
β βββ sentinel_brain.joblib # Random Forest classifier
β βββ label_encoder.joblib # Target label encoding
β βββ ohe_encoder.joblib # Protocol/flag one-hot encoder
β βββ freq_map.joblib # Service frequency dictionary
β βββ scaler.joblib # Standard scaler
β βββ selected_features.joblib # 41 feature names + order
βββ app.py # Flask server + /predict + /health endpoints
βββ requirements.txt # Python dependencies (Flask, scikit-learn, etc.)
βββ Dockerfile # Multi-stage build for HuggingFace Spaces
βββ .dockerignore # Excludes unnecessary files from build
βββ .github/
β βββ workflows/
β βββ ci.yml # GitHub Actions CI pipeline
βββ README.md # This file
π CI/CD Pipeline
Continuous Integration (GitHub Actions)
on: [push, pull_request]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Install dependencies
run: pip install -r requirements.txt
- name: Syntax check
run: python -m py_compile app.py
- name: Health check (skip models)
env:
SKIP_MODEL: true
run: python app.py &
sleep 2
curl http://localhost:7860/health
- name: Docker build test
run: docker build -t sentinelnet:test .
CI Features:
- β Python 3.10 environment setup
- β Dependency installation verification
- β Code syntax validation
- β
Flask app health check (with
SKIP_MODEL=trueto avoid model loading timeout) - β Docker image build validation
Continuous Deployment (HuggingFace Spaces)
- Trigger: Push to
mainbranch - Action: Auto-deploys Docker container to HuggingFace Spaces
- Endpoint: https://huggingface.co/spaces/Hitan2004/sentinelnet
- Uptime: Always available (free tier with occasional cold starts)
π What I Learned
β Production ML Systems
- Training and deploying multi-class classification models end-to-end
- Feature engineering and preprocessing pipeline serialization
- Model serving via REST API with batch inference
β Real-Time Dashboards
- Building interactive dashboards with vanilla JavaScript
- Canvas API for high-performance charting (thousands of data points)
- Responsive design for desktop and tablet
β Backend Engineering
- Flask REST API design and CORS handling
- Batch processing with streaming progress feedback
- Error handling and validation
β DevOps & Deployment
- Docker containerization for reproducible environments
- HuggingFace Spaces deployment workflow
- GitHub Actions CI/CD pipeline with smart skipping
β Advanced Concepts
- NSL-KDD dataset characteristics and threat modeling
- One-hot vs. frequency encoding trade-offs
- Log transforms for skewed feature distributions
- Cross-entropy loss and feature importance in Random Forest
π Dataset Reference
NSL-KDD Dataset
- Improved version of KDD Cup 1999
- Size: 125,973 training records, 22,544 test records
- Features: 41 network connection attributes
- Classes: 5 (normal, DoS, Probe, R2L, U2R)
- Advantages: Removes duplicate records, more balanced class distribution
- Standard: Widely used benchmark for IDS research
Attribute Categories:
- Basic features (10): duration, protocol, service, flag, bytes
- Content features (13): hot, num_failed_logins, logged_in, compromised, etc.
- Time-based traffic features (9): count, srv_count, serror_rate, etc.
- Host-based traffic features (9): dst_host_count, dst_host_srv_count, etc.
π€ Contributing
This is a portfolio project, but you're welcome to fork and extend!
Ideas for enhancement:
- Add LSTM-based temporal anomaly detection
- Implement feature importance visualization
- Add real PCAP file ingestion
- Multi-model ensemble (XGBoost + Neural Network)
- Real-time alerting webhook integration
π License
MIT License β Use freely for learning, portfolio, or production purposes.
π Contact
Hitan K β AI Systems Engineer
- π LinkedIn
- π GitHub
- π€ HuggingFace
- π§ Email
β If this helped you, please star the repo! β
Built with β€οΈ for production and learning.