Spaces:
Configuration error
Firewall Log Classifier
A machine learning system for automated classification of firewall log entries into four action categories: allow, deny, drop, and reset-both. Built as part of CSAI 801 β Artificial Intelligence and Machine Learning.
Live Application: https://huggingface.co/spaces/yomnafarag95/Log_Classifier
Overview
Enterprise firewalls generate thousands of log entries per hour, making manual review impractical. This project trains a tuned Random Forest classifier on real network traffic data to automate that review process, achieving 99.56% test accuracy across four action classes.
Model Performance
| Model | Test Accuracy | Macro F1 |
|---|---|---|
| Random Forest (baseline) | 98.32% | 0.981 |
| Logistic Regression | 99.75% | 0.997 |
| KNN | 99.23% | 0.990 |
| Random Forest (tuned) | 99.56% | 0.803 |
Tuned hyperparameters: n_estimators=200, max_depth=20, min_samples_split=2
Dataset
- Source: UCI Machine Learning Repository β Internet Firewall Data
- URL: https://archive.ics.uci.edu/dataset/542/internet+firewall+data
- Raw records: 65,532
- After deduplication: 57,170
- Class distribution: allow (37,439) Β· drop (11,635) Β· deny (8,042) Β· reset-both (54)
Input features (11):
| # | Feature |
|---|---|
| 1 | Source Port |
| 2 | Destination Port |
| 3 | NAT Source Port |
| 4 | NAT Destination Port |
| 5 | Bytes |
| 6 | Bytes Sent |
| 7 | Bytes Received |
| 8 | Packets |
| 9 | Elapsed Time (sec) |
| 10 | pkts_sent |
| 11 | pkts_received |
Preprocessing Pipeline
- Duplicate removal (65,532 β 57,170 records)
- Stratified 70/30 train/test split
- SMOTE oversampling on training set to balance minority classes
- StandardScaler normalization
Try the Application
Paste any of the following lines into the application input and click Classify. Each line contains 11 comma-separated values matching the feature order above.
Allow
51465,443,39975,443,3961,1595,2366,21,16,12,9
Deny
34086,25174,0,0,62,62,0,1,0,1,0
Drop
51125,445,0,0,66,66,0,1,0,1,0
Reset-Both
64461,31652,0,0,62,62,0,1,0,1,0
Run Locally
git clone https://github.com/yomnafarag95/Log_Classifier.git
cd Log_Classifier
pip install -r requirements.txt
streamlit run app.py
Retrain the Model
pip install scikit-learn imbalanced-learn pandas joblib
python retrain.py
Outputs: model.joblib, scaler.joblib, label_encoder.joblib
Repository Structure
Log_Classifier/
βββ app.py Streamlit web application
βββ retrain.py Model retraining script
βββ model.joblib Trained Random Forest model
βββ scaler.joblib Fitted StandardScaler
βββ label_encoder.joblib Label encoder for action classes
βββ requirements.txt Python dependencies
βββ The_Report.pdf Full project report
Authors
Yasmeen Algendy, Yomna Algendy, Zahraa Mohamed Supervisor: Dr. Marwa Elsayed CSAI 801 β Artificial Intelligence and Machine Learning