Spaces:
Configuration error
Configuration error
File size: 3,380 Bytes
7877434 037aa94 7877434 037aa94 7877434 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 | # Firewall Log Classifier
A machine learning system for automated classification of firewall log entries into four action categories: allow, deny, drop, and reset-both. Built as part of CSAI 801 β Artificial Intelligence and Machine Learning.
**Live Application:** https://huggingface.co/spaces/yomnafarag95/Log_Classifier
---
## Overview
Enterprise firewalls generate thousands of log entries per hour, making manual review impractical. This project trains a tuned Random Forest classifier on real network traffic data to automate that review process, achieving 99.56% test accuracy across four action classes.
---
## Model Performance
| Model | Test Accuracy | Macro F1 |
|-------------------------|--------------|----------|
| Random Forest (baseline)| 98.32% | 0.981 |
| Logistic Regression | 99.75% | 0.997 |
| KNN | 99.23% | 0.990 |
| Random Forest (tuned) | **99.56%** | **0.803**|
Tuned hyperparameters: `n_estimators=200`, `max_depth=20`, `min_samples_split=2`
---
## Dataset
- **Source:** UCI Machine Learning Repository β Internet Firewall Data
- **URL:** https://archive.ics.uci.edu/dataset/542/internet+firewall+data
- **Raw records:** 65,532
- **After deduplication:** 57,170
- **Class distribution:** allow (37,439) Β· drop (11,635) Β· deny (8,042) Β· reset-both (54)
**Input features (11):**
| # | Feature |
|---|----------------------|
| 1 | Source Port |
| 2 | Destination Port |
| 3 | NAT Source Port |
| 4 | NAT Destination Port |
| 5 | Bytes |
| 6 | Bytes Sent |
| 7 | Bytes Received |
| 8 | Packets |
| 9 | Elapsed Time (sec) |
|10 | pkts_sent |
|11 | pkts_received |
---
## Preprocessing Pipeline
1. Duplicate removal (65,532 β 57,170 records)
2. Stratified 70/30 train/test split
3. SMOTE oversampling on training set to balance minority classes
4. StandardScaler normalization
---
## Try the Application
Paste any of the following lines into the application input and click Classify.
Each line contains 11 comma-separated values matching the feature order above.
**Allow**
```
51465,443,39975,443,3961,1595,2366,21,16,12,9
```
**Deny**
```
34086,25174,0,0,62,62,0,1,0,1,0
```
**Drop**
```
51125,445,0,0,66,66,0,1,0,1,0
```
**Reset-Both**
```
64461,31652,0,0,62,62,0,1,0,1,0
```
---
## Run Locally
```bash
git clone https://github.com/yomnafarag95/Log_Classifier.git
cd Log_Classifier
pip install -r requirements.txt
streamlit run app.py
```
---
## Retrain the Model
```bash
pip install scikit-learn imbalanced-learn pandas joblib
python retrain.py
```
Outputs: `model.joblib`, `scaler.joblib`, `label_encoder.joblib`
---
## Repository Structure
```
Log_Classifier/
βββ app.py Streamlit web application
βββ retrain.py Model retraining script
βββ model.joblib Trained Random Forest model
βββ scaler.joblib Fitted StandardScaler
βββ label_encoder.joblib Label encoder for action classes
βββ requirements.txt Python dependencies
βββ The_Report.pdf Full project report
```
---
## Authors
Yasmeen Algendy, Yomna Algendy, Zahraa Mohamed
Supervisor: Dr. Marwa Elsayed
CSAI 801 β Artificial Intelligence and Machine Learning
|