|
|
--- |
|
|
title: AI NIDS Student Project |
|
|
emoji: π‘οΈ |
|
|
colorFrom: blue |
|
|
colorTo: green |
|
|
sdk: streamlit |
|
|
sdk_version: 1.39.0 |
|
|
app_file: app.py |
|
|
pinned: false |
|
|
|
|
|
|
|
|
--- |
|
|
# π‘οΈ AI-Based Network Intrusion Detection System (Student Project) |
|
|
|
|
|
Project Status: Phase 1 β Pre-Production / Offline Prototype |
|
|
|
|
|
This project demonstrates how to use **Machine Learning (Random Forest)** and **Generative AI (Groq)** to detect and explain network attacks (specifically DDoS). |
|
|
|
|
|
## π How to Use |
|
|
|
|
|
1. **Enter API Key:** Paste your Groq API key in the sidebar (optional, for AI explanations). |
|
|
2. **Train Model:** Click the "Train Model Now" button. The system loads the `Friday-WorkingHours...` dataset automatically. |
|
|
3. **Simulate:** Click "π² Capture Random Packet" to pick a real network packet from the test set. |
|
|
4. **Analyze:** See if the model flags it as **BENIGN** or **DDoS**, and ask Groq to explain why. |
|
|
|
|
|
## π Files |
|
|
|
|
|
- `app.py`: The main Python application code. |
|
|
- `requirements.txt`: List of libraries used. |
|
|
- `Friday-WorkingHours-Afternoon-DDos.pcap_ISCX.csv`: The dataset (CIC-IDS2017 subset). |
|
|
|
|
|
## π§ PHASE 0 β Foundation Hardening (completed) |
|
|
|
|
|
This repository includes an incremental, production-aligned hardening of the original student project. |
|
|
|
|
|
- Deterministic reproducibility (global seed, logging). |
|
|
- Explicit data validation and feature checks. |
|
|
- Class-imbalance handling via `class_weight='balanced'`. |
|
|
- Stratified 5-fold cross-validation with per-fold metrics. |
|
|
- Evaluation metrics replaced accuracy with: precision, recall, F1, PR-AUC, ROC-AUC, and confusion matrices. |
|
|
- Artifacts saved to `models/` and `metrics/` (see below). |
|
|
|
|
|
These changes are intentionally small and reversible β see `training_utils.py` for the training implementation. |
|
|
|
|
|
## π¦ Artifacts (generated after training) |
|
|
|
|
|
- `models/rf_model.joblib` β serialized RandomForest model (best fold). |
|
|
- `metrics/training_metrics.json` β timestamped CV metrics including PR-curve, seed, feature list. |
|
|
|
|
|
## β οΈ Dataset & Publishing |
|
|
|
|
|
- β οΈ Dataset Note: The full CIC-IDS2017 CSV (~96 MB) is intentionally excluded from GitHub. |
|
|
This repository focuses on model architecture and training logic. A small sample or synthetic dataset (`sample_data/sample_small.csv`) is included for demos; the full dataset is not committed. |
|
|
|
|
|
## βΆοΈ Run locally |
|
|
|
|
|
1. Create a virtual environment and install dependencies: |
|
|
|
|
|
```powershell |
|
|
python -m venv .venv |
|
|
.\.venv\Scripts\Activate.ps1 |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
2. Run the Streamlit app: |
|
|
|
|
|
```powershell |
|
|
streamlit run app.py |
|
|
``` |
|
|
|
|
|
## Contact / Next steps |
|
|
|
|
|
If you want, I can generate a small sample CSV (e.g., 1k rows) that allows publishing the repo to GitHub safely. |
|
|
|
|
|
## π About |
|
|
|
|
|
Created for a university cybersecurity project to demonstrate the integration of traditional ML and LLMs in security operations. |
|
|
|