sentinelnet / README.md
3v324v23's picture
Auto deploy from GitHub Actions
6baf846
# πŸ›‘οΈ SentinelNet β€” AI-Powered Network Intrusion Detection System
<div align="center">
**Production ML system detecting 5 categories of network threats in real-time**
[![Live Demo](https://img.shields.io/badge/Live%20Demo-HuggingFace%20Spaces-blue?style=for-the-badge&logo=huggingface)](https://huggingface.co/spaces/Hitan2004/sentinelnet)
[![GitHub](https://img.shields.io/badge/GitHub-Repository-black?style=for-the-badge&logo=github)](https://github.com/Hitan547/sentinelnet)
[![Python](https://img.shields.io/badge/Python-3.10-blue?style=for-the-badge&logo=python)](#tech-stack)
[![scikit-learn](https://img.shields.io/badge/ML-scikit--learn-orange?style=for-the-badge)](#tech-stack)
*A full-stack real-time intrusion detection dashboard with hybrid frontend, REST API, and automated CI/CD deployment.*
</div>
---
## 🎯 Overview
SentinelNet is a production-grade network intrusion detection system that analyzes live traffic and batch CSV datasets to classify connections into 5 threat categories. Built with a Random Forest classifier trained on the NSL-KDD dataset, it combines real-time inference with a sophisticated web dashboard and self-correcting batch processing.
### ⚑ Key Capabilities
| Feature | Capability |
|---------|-----------|
| **Real-Time Detection** | 1000s of live packets/sec through trained ML model |
| **Threat Classification** | 5-class detection: normal, DoS, Probe, R2L, U2R |
| **Batch Analysis** | Process CSVs with live progress, streaming predictions, auto-generated threat reports |
| **Visual Intelligence** | Live timeline, activity heatmaps, confidence distributions, attack patterns |
| **Export Formats** | CSV, PDF reports, JSON for integration |
| **Deployment** | Docker containerized, live on HuggingFace Spaces |
---
## πŸ—οΈ Architecture
### System Diagram
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ SentinelNet System β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Flask Backend β”‚
β”‚ (app.py) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ β”‚ β”‚
β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
β”‚ /health β”‚ β”‚/predict β”‚ β”‚ /static β”‚
β”‚ Endpoint β”‚ β”‚ Batch β”‚ β”‚ Frontend β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ Inferenceβ”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ β”‚ β”‚
β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ML Pipelineβ”‚ β”‚One-Hot β”‚ β”‚Label β”‚
β”‚Processing β”‚ β”‚Encoder β”‚ β”‚Encoder β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Random Forest Classifier β”‚
β”‚ (sentinel_brain.joblib) β”‚
β”‚ 41 NSL-KDD Features β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
### Data Flow
```
User Input (Live or CSV)
↓
Feature Extraction & Validation
↓
One-Hot Encoding (protocol_type, flag)
↓
Frequency Encoding (service)
↓
Log Transforms (src_bytes, dst_bytes, duration)
↓
Feature Engineering (total_bytes, ratios, error flags)
↓
Standard Scaling (all features)
↓
Random Forest Inference
↓
Prediction + Confidence Score
↓
Severity Mapping
↓
JSON Response / Dashboard Update
```
---
## πŸ“Š Model Performance
### Training Details
- **Algorithm**: Random Forest Classifier (100 trees)
- **Dataset**: NSL-KDD (improved KDD Cup 1999)
- **Features**: 41 network connection attributes
- **Classes**: 5 (normal, DoS, Probe, R2L, U2R)
- **Preprocessing**: OHE, frequency encoding, log transforms, standard scaling
### Threat Categories
| Class | Type | Severity | Examples |
|-------|------|----------|----------|
| `normal` | Clean traffic | βœ… None | HTTP requests, DNS queries |
| `DoS` | Denial of Service | πŸ”΄ **Critical** | SYN floods, UDP storms |
| `Probe` | Reconnaissance | 🟠 Medium | Port scanning, OS fingerprinting |
| `R2L` | Remote to Local | πŸ”΄ High | SSH brute force, FTP attacks |
| `U2R` | User to Root | πŸ”΄ **Critical** | Buffer overflow, privilege escalation |
---
## ✨ Features
### πŸ“‘ Live Monitor Tab
Real-time threat detection with auto-generated NSL-KDD formatted packets
- **Auto-Generation**: Simulates realistic network traffic packets
- **Real-Time Inference**: Each packet sent to trained model instantly
- **Live Detection Feed**: Class, confidence, severity per packet
- **Attack Distribution Chart**: Bar chart updating in real-time
- **Threat Timeline**: Last 60 seconds of activity
- **Activity Heatmap**: 60Γ—8 grid of recent packets
- **Confidence Distribution**: Histogram of model certainty
- **System Log**: Terminal-style event log
- **Session Summary**: Total packets, attacks detected, accuracy metrics
### πŸ“‚ CSV Analysis Tab
Upload and analyze NSL-KDD formatted datasets with streaming predictions
- **Smart Header Detection**: Auto-detects with or without column names
- **Batch Processing**: Optimized row-by-row inference through model
- **Live Progress**: Real-time bar with ETA and processing speed (rows/sec)
- **Streaming Results**: Predictions appear as they're computed
- **Threat Report Generation** (on completion):
- Risk score gauge (0–100)
- Class distribution bar chart
- Confidence waveform over entire dataset
- Threat intensity rolling average
- Protocol breakdown pie chart
- Top targeted services
- Attack pattern clustering visualization
- Paginated full results table with sorting/filtering
- **Multi-Format Export**: CSV, PDF report, JSON
---
## 🧠 ML Pipeline Deep Dive
### Feature Engineering
```python
# Input: 41 raw NSL-KDD features
features_raw = {
'duration', 'protocol_type', 'service', 'flag',
'src_bytes', 'dst_bytes', 'land', 'wrong_fragment',
'urgent', 'hot', 'num_failed_logins', 'logged_in',
'num_compromised', 'root_shell', 'su_attempted',
'num_root', 'num_file_creations', 'num_shells',
'num_access_files', 'num_outbound_cmds', 'is_host_login',
'is_guest_login', 'count', 'srv_count', 'serror_rate',
'srv_serror_rate', 'rerror_rate', 'srv_rerror_rate',
'same_srv_rate', 'diff_srv_rate', 'srv_diff_host_rate',
'dst_host_count', 'dst_host_srv_count', 'dst_host_same_srv_rate',
'dst_host_diff_srv_rate', 'dst_host_same_src_port_rate',
'dst_host_srv_diff_host_rate'
}
# Preprocessing Pipeline
1. One-hot encoding: protocol_type (3 categories) β†’ 3 columns
2. One-hot encoding: flag (11 categories) β†’ 11 columns
3. Frequency encoding: service β†’ maps to frequency rank
4. Log transforms: log(1 + src_bytes), log(1 + dst_bytes), log(1 + duration)
5. Feature engineering:
- total_bytes = src_bytes + dst_bytes
- src_bytes_ratio = src_bytes / (total_bytes + 1)
- is_error_flag = 1 if error flag present
6. Standard scaling: (x - mean) / std for all numeric features
# Output: 41 standardized features β†’ Random Forest inference
```
### Serialization
All pipeline artifacts are serialized with `joblib` for production reliability:
```
models/
β”œβ”€β”€ sentinel_brain.joblib # Trained Random Forest (100 trees)
β”œβ”€β”€ label_encoder.joblib # Encodes target class labels
β”œβ”€β”€ ohe_encoder.joblib # One-hot encoder for protocol/flag
β”œβ”€β”€ freq_map.joblib # Service frequency mapping dictionary
β”œβ”€β”€ scaler.joblib # StandardScaler fitted on training data
└── selected_features.joblib # List of 41 selected features in order
```
---
## πŸš€ Quick Start
### Prerequisites
- Python 3.10+
- pip or conda
- 500MB disk space for models
### Local Setup (5 minutes)
```bash
# 1. Clone repository
git clone https://github.com/Hitan547/sentinelnet.git
cd sentinelnet
# 2. Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Run Flask server
python app.py
# 5. Open browser
# β†’ http://localhost:7860
```
### Docker Setup (for Spaces or cloud deployment)
```bash
# Build image
docker build -t sentinelnet:latest .
# Run container
docker run -p 7860:7860 sentinelnet:latest
# Access at http://localhost:7860
```
### Deployment on HuggingFace Spaces
1. Create new Space on HuggingFace
2. Select "Docker" runtime
3. Clone this repo
4. Push to Space repo
5. Auto-deploys and serves live
---
## πŸ”Œ REST API Reference
### POST `/predict`
Batch inference endpoint for NSL-KDD formatted network packets
**Request:**
```json
{
"rows": [
{
"duration": 0,
"protocol_type": "tcp",
"service": "http",
"flag": "SF",
"src_bytes": 181,
"dst_bytes": 5450,
"land": 0,
"wrong_fragment": 0,
"urgent": 0,
"hot": 0,
"num_failed_logins": 0,
"logged_in": 1,
"num_compromised": 0,
"root_shell": 0,
"su_attempted": 0,
"num_root": 0,
"num_file_creations": 0,
"num_shells": 0,
"num_access_files": 0,
"num_outbound_cmds": 0,
"is_host_login": 0,
"is_guest_login": 0,
"count": 1,
"srv_count": 1,
"serror_rate": 0.0,
"srv_serror_rate": 0.0,
"rerror_rate": 0.0,
"srv_rerror_rate": 0.0,
"same_srv_rate": 1.0,
"diff_srv_rate": 0.0,
"srv_diff_host_rate": 0.0,
"dst_host_count": 1,
"dst_host_srv_count": 1,
"dst_host_same_srv_rate": 1.0,
"dst_host_diff_srv_rate": 0.0,
"dst_host_same_src_port_rate": 0.0,
"dst_host_srv_diff_host_rate": 0.0
}
]
}
```
**Response:**
```json
{
"status": "ok",
"results": [
{
"predicted_class": "normal",
"severity": "None",
"confidence": 0.9821,
"is_intrusion": false
}
]
}
```
### GET `/health`
System health check
**Response:**
```json
{
"status": "online",
"model": "sentinel_brain",
"version": "1.0.0",
"uptime_seconds": 3600
}
```
---
## πŸ“ Project Structure
```
sentinelnet/
β”œβ”€β”€ frontend/
β”‚ β”œβ”€β”€ index.html # Main HTML with tabs, charts, tables
β”‚ β”œβ”€β”€ style.css # CSS variables, grid layout, animations
β”‚ └── app.js # Canvas charts, API calls, event handlers
β”œβ”€β”€ models/
β”‚ β”œβ”€β”€ sentinel_brain.joblib # Random Forest classifier
β”‚ β”œβ”€β”€ label_encoder.joblib # Target label encoding
β”‚ β”œβ”€β”€ ohe_encoder.joblib # Protocol/flag one-hot encoder
β”‚ β”œβ”€β”€ freq_map.joblib # Service frequency dictionary
β”‚ β”œβ”€β”€ scaler.joblib # Standard scaler
β”‚ └── selected_features.joblib # 41 feature names + order
β”œβ”€β”€ app.py # Flask server + /predict + /health endpoints
β”œβ”€β”€ requirements.txt # Python dependencies (Flask, scikit-learn, etc.)
β”œβ”€β”€ Dockerfile # Multi-stage build for HuggingFace Spaces
β”œβ”€β”€ .dockerignore # Excludes unnecessary files from build
β”œβ”€β”€ .github/
β”‚ └── workflows/
β”‚ └── ci.yml # GitHub Actions CI pipeline
└── README.md # This file
```
---
## πŸ”„ CI/CD Pipeline
### Continuous Integration (GitHub Actions)
```yaml
on: [push, pull_request]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Install dependencies
run: pip install -r requirements.txt
- name: Syntax check
run: python -m py_compile app.py
- name: Health check (skip models)
env:
SKIP_MODEL: true
run: python app.py &
sleep 2
curl http://localhost:7860/health
- name: Docker build test
run: docker build -t sentinelnet:test .
```
**CI Features:**
- βœ… Python 3.10 environment setup
- βœ… Dependency installation verification
- βœ… Code syntax validation
- βœ… Flask app health check (with `SKIP_MODEL=true` to avoid model loading timeout)
- βœ… Docker image build validation
### Continuous Deployment (HuggingFace Spaces)
- **Trigger**: Push to `main` branch
- **Action**: Auto-deploys Docker container to HuggingFace Spaces
- **Endpoint**: https://huggingface.co/spaces/Hitan2004/sentinelnet
- **Uptime**: Always available (free tier with occasional cold starts)
---
## πŸŽ“ What I Learned
βœ… **Production ML Systems**
- Training and deploying multi-class classification models end-to-end
- Feature engineering and preprocessing pipeline serialization
- Model serving via REST API with batch inference
βœ… **Real-Time Dashboards**
- Building interactive dashboards with vanilla JavaScript
- Canvas API for high-performance charting (thousands of data points)
- Responsive design for desktop and tablet
βœ… **Backend Engineering**
- Flask REST API design and CORS handling
- Batch processing with streaming progress feedback
- Error handling and validation
βœ… **DevOps & Deployment**
- Docker containerization for reproducible environments
- HuggingFace Spaces deployment workflow
- GitHub Actions CI/CD pipeline with smart skipping
βœ… **Advanced Concepts**
- NSL-KDD dataset characteristics and threat modeling
- One-hot vs. frequency encoding trade-offs
- Log transforms for skewed feature distributions
- Cross-entropy loss and feature importance in Random Forest
---
## πŸ“Š Dataset Reference
**NSL-KDD Dataset**
- Improved version of KDD Cup 1999
- **Size**: 125,973 training records, 22,544 test records
- **Features**: 41 network connection attributes
- **Classes**: 5 (normal, DoS, Probe, R2L, U2R)
- **Advantages**: Removes duplicate records, more balanced class distribution
- **Standard**: Widely used benchmark for IDS research
**Attribute Categories:**
- Basic features (10): duration, protocol, service, flag, bytes
- Content features (13): hot, num_failed_logins, logged_in, compromised, etc.
- Time-based traffic features (9): count, srv_count, serror_rate, etc.
- Host-based traffic features (9): dst_host_count, dst_host_srv_count, etc.
---
## 🀝 Contributing
This is a portfolio project, but you're welcome to fork and extend!
**Ideas for enhancement:**
- [ ] Add LSTM-based temporal anomaly detection
- [ ] Implement feature importance visualization
- [ ] Add real PCAP file ingestion
- [ ] Multi-model ensemble (XGBoost + Neural Network)
- [ ] Real-time alerting webhook integration
---
## πŸ“œ License
MIT License β€” Use freely for learning, portfolio, or production purposes.
---
## πŸ“ž Contact
**Hitan K** β€” AI Systems Engineer
- πŸ”— [LinkedIn](https://linkedin.com/in/hitan-k)
- πŸ™ [GitHub](https://github.com/Hitan547)
- πŸ€— [HuggingFace](https://huggingface.co/Hitan2004)
- πŸ“§ [Email](mailto:hitan.k@outlook.com)
---
<div align="center">
**⭐ If this helped you, please star the repo! ⭐**
*Built with ❀️ for production and learning.*
</div>