--- title: AI NIDS Student Project emoji: 🛡️ colorFrom: blue colorTo: green sdk: streamlit sdk_version: 1.39.0 app_file: app.py pinned: false --- # 🛡️ AI-Based Network Intrusion Detection System (Student Project) Project Status: Phase 1 – Pre-Production / Offline Prototype This project demonstrates how to use **Machine Learning (Random Forest)** and **Generative AI (Groq)** to detect and explain network attacks (specifically DDoS). ## 🚀 How to Use 1. **Enter API Key:** Paste your Groq API key in the sidebar (optional, for AI explanations). 2. **Train Model:** Click the "Train Model Now" button. The system loads the `Friday-WorkingHours...` dataset automatically. 3. **Simulate:** Click "🎲 Capture Random Packet" to pick a real network packet from the test set. 4. **Analyze:** See if the model flags it as **BENIGN** or **DDoS**, and ask Groq to explain why. ## 📂 Files - `app.py`: The main Python application code. - `requirements.txt`: List of libraries used. - `Friday-WorkingHours-Afternoon-DDos.pcap_ISCX.csv`: The dataset (CIC-IDS2017 subset). ## 🔧 PHASE 0 — Foundation Hardening (completed) This repository includes an incremental, production-aligned hardening of the original student project. - Deterministic reproducibility (global seed, logging). - Explicit data validation and feature checks. - Class-imbalance handling via `class_weight='balanced'`. - Stratified 5-fold cross-validation with per-fold metrics. - Evaluation metrics replaced accuracy with: precision, recall, F1, PR-AUC, ROC-AUC, and confusion matrices. - Artifacts saved to `models/` and `metrics/` (see below). These changes are intentionally small and reversible — see `training_utils.py` for the training implementation. ## 📦 Artifacts (generated after training) - `models/rf_model.joblib` — serialized RandomForest model (best fold). - `metrics/training_metrics.json` — timestamped CV metrics including PR-curve, seed, feature list. ## ⚠️ Dataset & Publishing - ⚠️ Dataset Note: The full CIC-IDS2017 CSV (~96 MB) is intentionally excluded from GitHub. This repository focuses on model architecture and training logic. A small sample or synthetic dataset (`sample_data/sample_small.csv`) is included for demos; the full dataset is not committed. ## ▶️ Run locally 1. Create a virtual environment and install dependencies: ```powershell python -m venv .venv .\.venv\Scripts\Activate.ps1 pip install -r requirements.txt ``` 2. Run the Streamlit app: ```powershell streamlit run app.py ``` ## Contact / Next steps If you want, I can generate a small sample CSV (e.g., 1k rows) that allows publishing the repo to GitHub safely. ## 🎓 About Created for a university cybersecurity project to demonstrate the integration of traditional ML and LLMs in security operations.