Hitan2004 commited on
Commit
a97b4d1
·
verified ·
1 Parent(s): 9b9c599

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -489
README.md CHANGED
@@ -1,495 +1,20 @@
1
- # 🛡️ SentinelNet — AI-Powered Network Intrusion Detection System
2
-
3
- <div align="center">
4
-
5
- **Production ML system detecting 5 categories of network threats in real-time**
6
-
7
- [![Live Demo](https://img.shields.io/badge/Live%20Demo-HuggingFace%20Spaces-blue?style=for-the-badge&logo=huggingface)](https://huggingface.co/spaces/Hitan2004/sentinelnet)
8
- [![GitHub](https://img.shields.io/badge/GitHub-Repository-black?style=for-the-badge&logo=github)](https://github.com/Hitan547/sentinelnet)
9
- [![Python](https://img.shields.io/badge/Python-3.10-blue?style=for-the-badge&logo=python)](#tech-stack)
10
- [![scikit-learn](https://img.shields.io/badge/ML-scikit--learn-orange?style=for-the-badge)](#tech-stack)
11
-
12
- *A full-stack real-time intrusion detection dashboard with hybrid frontend, REST API, and automated CI/CD deployment.*
13
-
14
- </div>
15
-
16
- ---
17
-
18
- ## 🎯 Overview
19
-
20
- SentinelNet is a production-grade network intrusion detection system that analyzes live traffic and batch CSV datasets to classify connections into 5 threat categories. Built with a Random Forest classifier trained on the NSL-KDD dataset, it combines real-time inference with a sophisticated web dashboard and self-correcting batch processing.
21
-
22
- ### ⚡ Key Capabilities
23
-
24
- | Feature | Capability |
25
- |---------|-----------|
26
- | **Real-Time Detection** | 1000s of live packets/sec through trained ML model |
27
- | **Threat Classification** | 5-class detection: normal, DoS, Probe, R2L, U2R |
28
- | **Batch Analysis** | Process CSVs with live progress, streaming predictions, auto-generated threat reports |
29
- | **Visual Intelligence** | Live timeline, activity heatmaps, confidence distributions, attack patterns |
30
- | **Export Formats** | CSV, PDF reports, JSON for integration |
31
- | **Deployment** | Docker containerized, live on HuggingFace Spaces |
32
-
33
- ---
34
-
35
- ## 🏗️ Architecture
36
-
37
- ### System Diagram
38
-
39
- ```
40
- ┌─────────────────────────────────────────────────────────┐
41
- │ SentinelNet System │
42
- └─────────────────────────────────────────────────────────┘
43
-
44
- ┌──────────────────┐
45
- │ Flask Backend │
46
- │ (app.py) │
47
- └────────┬─────────┘
48
-
49
- ┌───────────────────┼───────────────────┐
50
- │ │ │
51
- ┌────▼────┐ ┌────▼────┐ ┌─────▼──────┐
52
- │ /health │ │/predict │ │ /static │
53
- │ Endpoint │ │ Batch │ │ Frontend │
54
- └──────────┘ │ Inference│ └────────────┘
55
- └────┬────┘
56
-
57
- ┌───────────────┼───────────────┐
58
- │ │ │
59
- ┌────▼──────┐ ┌────▼─────┐ ┌───▼──────────┐
60
- │ML Pipeline│ │One-Hot │ │Label │
61
- │Processing │ │Encoder │ │Encoder │
62
- └───────────┘ └───────────┘ └──────────────┘
63
-
64
- ┌────▼──────────────────────┐
65
- │ Random Forest Classifier │
66
- │ (sentinel_brain.joblib) │
67
- │ 41 NSL-KDD Features │
68
- └───────────────────────────┘
69
- ```
70
-
71
- ### Data Flow
72
-
73
- ```
74
- User Input (Live or CSV)
75
-
76
- Feature Extraction & Validation
77
-
78
- One-Hot Encoding (protocol_type, flag)
79
-
80
- Frequency Encoding (service)
81
-
82
- Log Transforms (src_bytes, dst_bytes, duration)
83
-
84
- Feature Engineering (total_bytes, ratios, error flags)
85
-
86
- Standard Scaling (all features)
87
-
88
- Random Forest Inference
89
-
90
- Prediction + Confidence Score
91
-
92
- Severity Mapping
93
-
94
- JSON Response / Dashboard Update
95
- ```
96
-
97
- ---
98
-
99
- ## 📊 Model Performance
100
-
101
- ### Training Details
102
-
103
- - **Algorithm**: Random Forest Classifier (100 trees)
104
- - **Dataset**: NSL-KDD (improved KDD Cup 1999)
105
- - **Features**: 41 network connection attributes
106
- - **Classes**: 5 (normal, DoS, Probe, R2L, U2R)
107
- - **Preprocessing**: OHE, frequency encoding, log transforms, standard scaling
108
-
109
- ### Threat Categories
110
-
111
- | Class | Type | Severity | Examples |
112
- |-------|------|----------|----------|
113
- | `normal` | Clean traffic | ✅ None | HTTP requests, DNS queries |
114
- | `DoS` | Denial of Service | 🔴 **Critical** | SYN floods, UDP storms |
115
- | `Probe` | Reconnaissance | 🟠 Medium | Port scanning, OS fingerprinting |
116
- | `R2L` | Remote to Local | 🔴 High | SSH brute force, FTP attacks |
117
- | `U2R` | User to Root | 🔴 **Critical** | Buffer overflow, privilege escalation |
118
-
119
  ---
120
-
121
- ## ✨ Features
122
-
123
- ### 📡 Live Monitor Tab
124
- Real-time threat detection with auto-generated NSL-KDD formatted packets
125
-
126
- - **Auto-Generation**: Simulates realistic network traffic packets
127
- - **Real-Time Inference**: Each packet sent to trained model instantly
128
- - **Live Detection Feed**: Class, confidence, severity per packet
129
- - **Attack Distribution Chart**: Bar chart updating in real-time
130
- - **Threat Timeline**: Last 60 seconds of activity
131
- - **Activity Heatmap**: 60×8 grid of recent packets
132
- - **Confidence Distribution**: Histogram of model certainty
133
- - **System Log**: Terminal-style event log
134
- - **Session Summary**: Total packets, attacks detected, accuracy metrics
135
-
136
- ### 📂 CSV Analysis Tab
137
- Upload and analyze NSL-KDD formatted datasets with streaming predictions
138
-
139
- - **Smart Header Detection**: Auto-detects with or without column names
140
- - **Batch Processing**: Optimized row-by-row inference through model
141
- - **Live Progress**: Real-time bar with ETA and processing speed (rows/sec)
142
- - **Streaming Results**: Predictions appear as they're computed
143
- - **Threat Report Generation** (on completion):
144
- - Risk score gauge (0–100)
145
- - Class distribution bar chart
146
- - Confidence waveform over entire dataset
147
- - Threat intensity rolling average
148
- - Protocol breakdown pie chart
149
- - Top targeted services
150
- - Attack pattern clustering visualization
151
- - Paginated full results table with sorting/filtering
152
- - **Multi-Format Export**: CSV, PDF report, JSON
153
-
154
- ---
155
-
156
- ## 🧠 ML Pipeline Deep Dive
157
-
158
- ### Feature Engineering
159
-
160
- ```python
161
- # Input: 41 raw NSL-KDD features
162
- features_raw = {
163
- 'duration', 'protocol_type', 'service', 'flag',
164
- 'src_bytes', 'dst_bytes', 'land', 'wrong_fragment',
165
- 'urgent', 'hot', 'num_failed_logins', 'logged_in',
166
- 'num_compromised', 'root_shell', 'su_attempted',
167
- 'num_root', 'num_file_creations', 'num_shells',
168
- 'num_access_files', 'num_outbound_cmds', 'is_host_login',
169
- 'is_guest_login', 'count', 'srv_count', 'serror_rate',
170
- 'srv_serror_rate', 'rerror_rate', 'srv_rerror_rate',
171
- 'same_srv_rate', 'diff_srv_rate', 'srv_diff_host_rate',
172
- 'dst_host_count', 'dst_host_srv_count', 'dst_host_same_srv_rate',
173
- 'dst_host_diff_srv_rate', 'dst_host_same_src_port_rate',
174
- 'dst_host_srv_diff_host_rate'
175
- }
176
-
177
- # Preprocessing Pipeline
178
- 1. One-hot encoding: protocol_type (3 categories) → 3 columns
179
- 2. One-hot encoding: flag (11 categories) → 11 columns
180
- 3. Frequency encoding: service → maps to frequency rank
181
- 4. Log transforms: log(1 + src_bytes), log(1 + dst_bytes), log(1 + duration)
182
- 5. Feature engineering:
183
- - total_bytes = src_bytes + dst_bytes
184
- - src_bytes_ratio = src_bytes / (total_bytes + 1)
185
- - is_error_flag = 1 if error flag present
186
- 6. Standard scaling: (x - mean) / std for all numeric features
187
-
188
- # Output: 41 standardized features → Random Forest inference
189
- ```
190
-
191
- ### Serialization
192
-
193
- All pipeline artifacts are serialized with `joblib` for production reliability:
194
-
195
- ```
196
- models/
197
- ├── sentinel_brain.joblib # Trained Random Forest (100 trees)
198
- ├── label_encoder.joblib # Encodes target class labels
199
- ├── ohe_encoder.joblib # One-hot encoder for protocol/flag
200
- ├── freq_map.joblib # Service frequency mapping dictionary
201
- ├── scaler.joblib # StandardScaler fitted on training data
202
- └── selected_features.joblib # List of 41 selected features in order
203
- ```
204
-
205
- ---
206
-
207
- ## 🚀 Quick Start
208
-
209
- ### Prerequisites
210
- - Python 3.10+
211
- - pip or conda
212
- - 500MB disk space for models
213
-
214
- ### Local Setup (5 minutes)
215
-
216
- ```bash
217
- # 1. Clone repository
218
- git clone https://github.com/Hitan547/sentinelnet.git
219
- cd sentinelnet
220
-
221
- # 2. Create virtual environment (recommended)
222
- python -m venv venv
223
- source venv/bin/activate # On Windows: venv\Scripts\activate
224
-
225
- # 3. Install dependencies
226
- pip install -r requirements.txt
227
-
228
- # 4. Run Flask server
229
- python app.py
230
-
231
- # 5. Open browser
232
- # → http://localhost:7860
233
- ```
234
-
235
- ### Docker Setup (for Spaces or cloud deployment)
236
-
237
- ```bash
238
- # Build image
239
- docker build -t sentinelnet:latest .
240
-
241
- # Run container
242
- docker run -p 7860:7860 sentinelnet:latest
243
-
244
- # Access at http://localhost:7860
245
- ```
246
-
247
- ### Deployment on HuggingFace Spaces
248
-
249
- 1. Create new Space on HuggingFace
250
- 2. Select "Docker" runtime
251
- 3. Clone this repo
252
- 4. Push to Space repo
253
- 5. Auto-deploys and serves live
254
-
255
- ---
256
-
257
- ## 🔌 REST API Reference
258
-
259
- ### POST `/predict`
260
- Batch inference endpoint for NSL-KDD formatted network packets
261
-
262
- **Request:**
263
- ```json
264
- {
265
- "rows": [
266
- {
267
- "duration": 0,
268
- "protocol_type": "tcp",
269
- "service": "http",
270
- "flag": "SF",
271
- "src_bytes": 181,
272
- "dst_bytes": 5450,
273
- "land": 0,
274
- "wrong_fragment": 0,
275
- "urgent": 0,
276
- "hot": 0,
277
- "num_failed_logins": 0,
278
- "logged_in": 1,
279
- "num_compromised": 0,
280
- "root_shell": 0,
281
- "su_attempted": 0,
282
- "num_root": 0,
283
- "num_file_creations": 0,
284
- "num_shells": 0,
285
- "num_access_files": 0,
286
- "num_outbound_cmds": 0,
287
- "is_host_login": 0,
288
- "is_guest_login": 0,
289
- "count": 1,
290
- "srv_count": 1,
291
- "serror_rate": 0.0,
292
- "srv_serror_rate": 0.0,
293
- "rerror_rate": 0.0,
294
- "srv_rerror_rate": 0.0,
295
- "same_srv_rate": 1.0,
296
- "diff_srv_rate": 0.0,
297
- "srv_diff_host_rate": 0.0,
298
- "dst_host_count": 1,
299
- "dst_host_srv_count": 1,
300
- "dst_host_same_srv_rate": 1.0,
301
- "dst_host_diff_srv_rate": 0.0,
302
- "dst_host_same_src_port_rate": 0.0,
303
- "dst_host_srv_diff_host_rate": 0.0
304
- }
305
- ]
306
- }
307
- ```
308
-
309
- **Response:**
310
- ```json
311
- {
312
- "status": "ok",
313
- "results": [
314
- {
315
- "predicted_class": "normal",
316
- "severity": "None",
317
- "confidence": 0.9821,
318
- "is_intrusion": false
319
- }
320
- ]
321
- }
322
- ```
323
-
324
- ### GET `/health`
325
- System health check
326
-
327
- **Response:**
328
- ```json
329
- {
330
- "status": "online",
331
- "model": "sentinel_brain",
332
- "version": "1.0.0",
333
- "uptime_seconds": 3600
334
- }
335
- ```
336
-
337
- ---
338
-
339
- ## 📁 Project Structure
340
-
341
- ```
342
- sentinelnet/
343
- ├── frontend/
344
- │ ├── index.html # Main HTML with tabs, charts, tables
345
- │ ├── style.css # CSS variables, grid layout, animations
346
- │ └── app.js # Canvas charts, API calls, event handlers
347
- ├── models/
348
- │ ├── sentinel_brain.joblib # Random Forest classifier
349
- │ ├── label_encoder.joblib # Target label encoding
350
- │ ├── ohe_encoder.joblib # Protocol/flag one-hot encoder
351
- │ ├── freq_map.joblib # Service frequency dictionary
352
- │ ├── scaler.joblib # Standard scaler
353
- │ └── selected_features.joblib # 41 feature names + order
354
- ├── app.py # Flask server + /predict + /health endpoints
355
- ├── requirements.txt # Python dependencies (Flask, scikit-learn, etc.)
356
- ├── Dockerfile # Multi-stage build for HuggingFace Spaces
357
- ├── .dockerignore # Excludes unnecessary files from build
358
- ├── .github/
359
- │ └── workflows/
360
- │ └── ci.yml # GitHub Actions CI pipeline
361
- └── README.md # This file
362
- ```
363
-
364
- ---
365
-
366
- ## 🔄 CI/CD Pipeline
367
-
368
- ### Continuous Integration (GitHub Actions)
369
-
370
- ```yaml
371
- on: [push, pull_request]
372
-
373
- jobs:
374
- build:
375
- runs-on: ubuntu-latest
376
- steps:
377
- - uses: actions/checkout@v3
378
- - uses: actions/setup-python@v4
379
- with:
380
- python-version: '3.10'
381
- - name: Install dependencies
382
- run: pip install -r requirements.txt
383
- - name: Syntax check
384
- run: python -m py_compile app.py
385
- - name: Health check (skip models)
386
- env:
387
- SKIP_MODEL: true
388
- run: python app.py &
389
- sleep 2
390
- curl http://localhost:7860/health
391
- - name: Docker build test
392
- run: docker build -t sentinelnet:test .
393
- ```
394
-
395
- **CI Features:**
396
- - ✅ Python 3.10 environment setup
397
- - ✅ Dependency installation verification
398
- - ✅ Code syntax validation
399
- - ✅ Flask app health check (with `SKIP_MODEL=true` to avoid model loading timeout)
400
- - ✅ Docker image build validation
401
-
402
- ### Continuous Deployment (HuggingFace Spaces)
403
-
404
- - **Trigger**: Push to `main` branch
405
- - **Action**: Auto-deploys Docker container to HuggingFace Spaces
406
- - **Endpoint**: https://huggingface.co/spaces/Hitan2004/sentinelnet
407
- - **Uptime**: Always available (free tier with occasional cold starts)
408
-
409
- ---
410
-
411
- ## 🎓 What I Learned
412
-
413
- ✅ **Production ML Systems**
414
- - Training and deploying multi-class classification models end-to-end
415
- - Feature engineering and preprocessing pipeline serialization
416
- - Model serving via REST API with batch inference
417
-
418
- ✅ **Real-Time Dashboards**
419
- - Building interactive dashboards with vanilla JavaScript
420
- - Canvas API for high-performance charting (thousands of data points)
421
- - Responsive design for desktop and tablet
422
-
423
- ✅ **Backend Engineering**
424
- - Flask REST API design and CORS handling
425
- - Batch processing with streaming progress feedback
426
- - Error handling and validation
427
-
428
- ✅ **DevOps & Deployment**
429
- - Docker containerization for reproducible environments
430
- - HuggingFace Spaces deployment workflow
431
- - GitHub Actions CI/CD pipeline with smart skipping
432
-
433
- ✅ **Advanced Concepts**
434
- - NSL-KDD dataset characteristics and threat modeling
435
- - One-hot vs. frequency encoding trade-offs
436
- - Log transforms for skewed feature distributions
437
- - Cross-entropy loss and feature importance in Random Forest
438
-
439
- ---
440
-
441
- ## 📊 Dataset Reference
442
-
443
- **NSL-KDD Dataset**
444
- - Improved version of KDD Cup 1999
445
- - **Size**: 125,973 training records, 22,544 test records
446
- - **Features**: 41 network connection attributes
447
- - **Classes**: 5 (normal, DoS, Probe, R2L, U2R)
448
- - **Advantages**: Removes duplicate records, more balanced class distribution
449
- - **Standard**: Widely used benchmark for IDS research
450
-
451
- **Attribute Categories:**
452
- - Basic features (10): duration, protocol, service, flag, bytes
453
- - Content features (13): hot, num_failed_logins, logged_in, compromised, etc.
454
- - Time-based traffic features (9): count, srv_count, serror_rate, etc.
455
- - Host-based traffic features (9): dst_host_count, dst_host_srv_count, etc.
456
-
457
- ---
458
-
459
- ## 🤝 Contributing
460
-
461
- This is a portfolio project, but you're welcome to fork and extend!
462
-
463
- **Ideas for enhancement:**
464
- - [ ] Add LSTM-based temporal anomaly detection
465
- - [ ] Implement feature importance visualization
466
- - [ ] Add real PCAP file ingestion
467
- - [ ] Multi-model ensemble (XGBoost + Neural Network)
468
- - [ ] Real-time alerting webhook integration
469
-
470
- ---
471
-
472
- ## 📜 License
473
-
474
- MIT License — Use freely for learning, portfolio, or production purposes.
475
-
476
- ---
477
-
478
- ## 📞 Contact
479
-
480
- **Hitan K** — AI Systems Engineer
481
-
482
- - 🔗 [LinkedIn](https://linkedin.com/in/hitan-k)
483
- - 🐙 [GitHub](https://github.com/Hitan547)
484
- - 🤗 [HuggingFace](https://huggingface.co/Hitan2004)
485
- - 📧 [Email](mailto:hitan.k@outlook.com)
486
-
487
  ---
488
 
489
- <div align="center">
490
 
491
- **⭐ If this helped you, please star the repo! ⭐**
492
 
493
- *Built with ❤️ for production and learning.*
 
 
 
494
 
495
- </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Agentic RAG UI
3
+ emoji: 🎨
4
+ colorFrom: pink
5
+ colorTo: blue
6
+ sdk: static
7
+ pinned: false
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  ---
9
 
10
+ # 🎨 Agentic RAG UI
11
 
12
+ Frontend interface for interacting with the Agentic RAG backend.
13
 
14
+ ## Features
15
+ - Clean UI for asking questions
16
+ - Displays answers with sources
17
+ - Connects to backend API
18
 
19
+ ## Usage
20
+ Enter your query and view AI-generated responses.