yomnafarag95 commited on
Commit
7877434
Β·
verified Β·
1 Parent(s): 7377758

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +129 -9
README.md CHANGED
@@ -1,11 +1,131 @@
 
 
 
 
 
 
1
  ---
2
- title: Log Classifier
3
- emoji: πŸš€
4
- colorFrom: red
5
- colorTo: red
6
- sdk: docker
7
- app_port: 7860
8
- tags:
9
- - streamlit
10
- pinned: false
11
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Firewall Log Classifier
2
+
3
+ A machine learning system for automated classification of firewall log entries into four action categories: allow, deny, drop, and reset-both. Built as part of CSAI 801 β€” Artificial Intelligence and Machine Learning.
4
+
5
+ **Live Application:** https://huggingface.co/spaces/yomnafarag95/Log_Classifier
6
+
7
  ---
8
+
9
+ ## Overview
10
+
11
+ Enterprise firewalls generate thousands of log entries per hour, making manual review impractical. This project trains a tuned Random Forest classifier on real network traffic data to automate that review process, achieving 99.56% test accuracy across four action classes.
12
+
 
 
 
 
13
  ---
14
+
15
+ ## Model Performance
16
+
17
+ | Model | Test Accuracy | Macro F1 |
18
+ |-------------------------|--------------|----------|
19
+ | Random Forest (baseline)| 98.32% | 0.981 |
20
+ | Logistic Regression | 99.75% | 0.997 |
21
+ | KNN | 99.23% | 0.990 |
22
+ | Random Forest (tuned) | **99.56%** | **0.803**|
23
+
24
+ Tuned hyperparameters: `n_estimators=200`, `max_depth=20`, `min_samples_split=2`
25
+
26
+ ---
27
+
28
+ ## Dataset
29
+
30
+ - **Source:** UCI Machine Learning Repository β€” Internet Firewall Data
31
+ - **URL:** https://archive.ics.uci.edu/dataset/542/internet+firewall+data
32
+ - **Raw records:** 65,532
33
+ - **After deduplication:** 57,170
34
+ - **Class distribution:** allow (37,439) Β· drop (11,635) Β· deny (8,042) Β· reset-both (54)
35
+
36
+ **Input features (11):**
37
+
38
+ | # | Feature |
39
+ |---|----------------------|
40
+ | 1 | Source Port |
41
+ | 2 | Destination Port |
42
+ | 3 | NAT Source Port |
43
+ | 4 | NAT Destination Port |
44
+ | 5 | Bytes |
45
+ | 6 | Bytes Sent |
46
+ | 7 | Bytes Received |
47
+ | 8 | Packets |
48
+ | 9 | Elapsed Time (sec) |
49
+ |10 | pkts_sent |
50
+ |11 | pkts_received |
51
+
52
+ ---
53
+
54
+ ## Preprocessing Pipeline
55
+
56
+ 1. Duplicate removal (65,532 β†’ 57,170 records)
57
+ 2. Stratified 70/30 train/test split
58
+ 3. SMOTE oversampling on training set to balance minority classes
59
+ 4. StandardScaler normalization
60
+
61
+ ---
62
+
63
+ ## Try the Application
64
+
65
+ Paste any of the following lines into the application input and click Classify.
66
+ Each line contains 11 comma-separated values matching the feature order above.
67
+
68
+ **Allow**
69
+ ```
70
+ 51465,443,39975,443,3961,1595,2366,21,16,12,9
71
+ ```
72
+
73
+ **Deny**
74
+ ```
75
+ 34086,25174,0,0,62,62,0,1,0,1,0
76
+ ```
77
+
78
+ **Drop**
79
+ ```
80
+ 51125,445,0,0,66,66,0,1,0,1,0
81
+ ```
82
+
83
+ **Reset-Both**
84
+ ```
85
+ 64461,31652,0,0,62,62,0,1,0,1,0
86
+ ```
87
+
88
+ ---
89
+
90
+ ## Run Locally
91
+
92
+ ```bash
93
+ git clone https://github.com/yomnafarag95/Log_Classifier.git
94
+ cd Log_Classifier
95
+ pip install -r requirements.txt
96
+ streamlit run app.py
97
+ ```
98
+
99
+ ---
100
+
101
+ ## Retrain the Model
102
+
103
+ ```bash
104
+ pip install scikit-learn imbalanced-learn pandas joblib
105
+ python retrain.py
106
+ ```
107
+
108
+ Outputs: `model.joblib`, `scaler.joblib`, `label_encoder.joblib`
109
+
110
+ ---
111
+
112
+ ## Repository Structure
113
+
114
+ ```
115
+ Log_Classifier/
116
+ β”œβ”€β”€ app.py Streamlit web application
117
+ β”œβ”€β”€ retrain.py Model retraining script
118
+ β”œβ”€β”€ model.joblib Trained Random Forest model
119
+ β”œβ”€β”€ scaler.joblib Fitted StandardScaler
120
+ β”œβ”€β”€ label_encoder.joblib Label encoder for action classes
121
+ β”œβ”€β”€ requirements.txt Python dependencies
122
+ └── The_Report.pdf Full project report
123
+ ```
124
+
125
+ ---
126
+
127
+ ## Authors
128
+
129
+ Yasmeen Algendy, Yomna Algendy, Zahraa Mohamed
130
+ Supervisor: Dr. Marwa Elsayed
131
+ CSAI 801 β€” Artificial Intelligence and Machine Learning