Spaces:

TanonymoSingh
/

AI-Based-Network-Intrusion-Detection-System

Sleeping

App Files Files Community

AI-Based-Network-Intrusion-Detection-System / README.md

Taranpreet Singh

Phase 1: Offline NIDS prototype with CV, threshold tuning

98d799c 10 days ago

preview code

raw

history blame contribute delete

2.83 kB

	---
	title: AI NIDS Student Project
	emoji: 🛡️
	colorFrom: blue
	colorTo: green
	sdk: streamlit
	sdk_version: 1.39.0
	app_file: app.py
	pinned: false


	---
	# 🛡️ AI-Based Network Intrusion Detection System (Student Project)

	Project Status: Phase 1 – Pre-Production / Offline Prototype

	This project demonstrates how to use Machine Learning (Random Forest) and Generative AI (Groq) to detect and explain network attacks (specifically DDoS).

	## 🚀 How to Use

	1. Enter API Key: Paste your Groq API key in the sidebar (optional, for AI explanations).
	2. Train Model: Click the "Train Model Now" button. The system loads the `Friday-WorkingHours...` dataset automatically.
	3. Simulate: Click "🎲 Capture Random Packet" to pick a real network packet from the test set.
	4. Analyze: See if the model flags it as BENIGN or DDoS, and ask Groq to explain why.

	## 📂 Files

	- `app.py`: The main Python application code.
	- `requirements.txt`: List of libraries used.
	- `Friday-WorkingHours-Afternoon-DDos.pcap_ISCX.csv`: The dataset (CIC-IDS2017 subset).

	## 🔧 PHASE 0 — Foundation Hardening (completed)

	This repository includes an incremental, production-aligned hardening of the original student project.

	- Deterministic reproducibility (global seed, logging).
	- Explicit data validation and feature checks.
	- Class-imbalance handling via `class_weight='balanced'`.
	- Stratified 5-fold cross-validation with per-fold metrics.
	- Evaluation metrics replaced accuracy with: precision, recall, F1, PR-AUC, ROC-AUC, and confusion matrices.
	- Artifacts saved to `models/` and `metrics/` (see below).

	These changes are intentionally small and reversible — see `training_utils.py` for the training implementation.

	## 📦 Artifacts (generated after training)

	- `models/rf_model.joblib` — serialized RandomForest model (best fold).
	- `metrics/training_metrics.json` — timestamped CV metrics including PR-curve, seed, feature list.

	## ⚠️ Dataset & Publishing

	- ⚠️ Dataset Note: The full CIC-IDS2017 CSV (~96 MB) is intentionally excluded from GitHub.
	This repository focuses on model architecture and training logic. A small sample or synthetic dataset (`sample_data/sample_small.csv`) is included for demos; the full dataset is not committed.

	## ▶️ Run locally

	1. Create a virtual environment and install dependencies:

	```powershell
	python -m venv .venv
	.\.venv\Scripts\Activate.ps1
	pip install -r requirements.txt
	```

	2. Run the Streamlit app:

	```powershell
	streamlit run app.py
	```

	## Contact / Next steps

	If you want, I can generate a small sample CSV (e.g., 1k rows) that allows publishing the repo to GitHub safely.

	## 🎓 About

	Created for a university cybersecurity project to demonstrate the integration of traditional ML and LLMs in security operations.