Taranpreet Singh
Phase 1: Offline NIDS prototype with CV, threshold tuning
98d799c

A newer version of the Streamlit SDK is available: 1.53.1

Upgrade
metadata
title: AI NIDS Student Project
emoji: πŸ›‘οΈ
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.39.0
app_file: app.py
pinned: false

πŸ›‘οΈ AI-Based Network Intrusion Detection System (Student Project)

Project Status: Phase 1 – Pre-Production / Offline Prototype

This project demonstrates how to use Machine Learning (Random Forest) and Generative AI (Groq) to detect and explain network attacks (specifically DDoS).

πŸš€ How to Use

  1. Enter API Key: Paste your Groq API key in the sidebar (optional, for AI explanations).
  2. Train Model: Click the "Train Model Now" button. The system loads the Friday-WorkingHours... dataset automatically.
  3. Simulate: Click "🎲 Capture Random Packet" to pick a real network packet from the test set.
  4. Analyze: See if the model flags it as BENIGN or DDoS, and ask Groq to explain why.

πŸ“‚ Files

  • app.py: The main Python application code.
  • requirements.txt: List of libraries used.
  • Friday-WorkingHours-Afternoon-DDos.pcap_ISCX.csv: The dataset (CIC-IDS2017 subset).

πŸ”§ PHASE 0 β€” Foundation Hardening (completed)

This repository includes an incremental, production-aligned hardening of the original student project.

  • Deterministic reproducibility (global seed, logging).
  • Explicit data validation and feature checks.
  • Class-imbalance handling via class_weight='balanced'.
  • Stratified 5-fold cross-validation with per-fold metrics.
  • Evaluation metrics replaced accuracy with: precision, recall, F1, PR-AUC, ROC-AUC, and confusion matrices.
  • Artifacts saved to models/ and metrics/ (see below).

These changes are intentionally small and reversible β€” see training_utils.py for the training implementation.

πŸ“¦ Artifacts (generated after training)

  • models/rf_model.joblib β€” serialized RandomForest model (best fold).
  • metrics/training_metrics.json β€” timestamped CV metrics including PR-curve, seed, feature list.

⚠️ Dataset & Publishing

  • ⚠️ Dataset Note: The full CIC-IDS2017 CSV (~96 MB) is intentionally excluded from GitHub. This repository focuses on model architecture and training logic. A small sample or synthetic dataset (sample_data/sample_small.csv) is included for demos; the full dataset is not committed.

▢️ Run locally

  1. Create a virtual environment and install dependencies:
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
  1. Run the Streamlit app:
streamlit run app.py

Contact / Next steps

If you want, I can generate a small sample CSV (e.g., 1k rows) that allows publishing the repo to GitHub safely.

πŸŽ“ About

Created for a university cybersecurity project to demonstrate the integration of traditional ML and LLMs in security operations.