Na-Rajan commited on
Commit
dea39de
Β·
verified Β·
1 Parent(s): 2550d31

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +90 -3
README.md CHANGED
@@ -1,3 +1,90 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+ Model Name
5
+ CyberAttackClassifier V1 – A Random Forest-based model for classifying cybersecurity attacks using network and system log data.
6
+
7
+ πŸ“– Overview
8
+ CyberAttackClassifier V1 is a supervised machine learning model trained to classify various types of cybersecurity attacks based on structured log and alert data. It uses a Random Forest Classifier trained on a feature-selected dataset, achieving near-perfect performance across multiple evaluation metrics.
9
+
10
+ πŸ” Intended Uses
11
+ Threat Detection: Automatically classify incoming events or logs into known attack categories.
12
+
13
+ Security Monitoring: Enhance SIEM systems with predictive capabilities.
14
+
15
+ Feature Analysis: Identify key indicators and patterns associated with different attack types.
16
+
17
+ Incident Response Prioritization: Quickly assess and categorize threats for faster triage.
18
+
19
+ 🧠 Model Architecture
20
+ Attribute Value
21
+ Model Type Random Forest Classifier
22
+ Framework scikit-learn
23
+ Input Shape (raw) (100000, 197)
24
+ Input Shape (selected) (100000, 50)
25
+ Feature Selection SelectKBest (f_classif)
26
+ Categorical Imputation 'Unknown' for missing values
27
+ Encoding One-hot for categorical features
28
+ Scaling StandardScaler for numerical features
29
+ πŸ“š Training Details
30
+ Dataset Size: 100,000 samples
31
+
32
+ Missing Values: Imputed in object-type columns
33
+
34
+ Feature Selection: Top 50 features selected using ANOVA F-test
35
+
36
+ Train/Test Split: Standard split (e.g., 80/20 or stratified)
37
+
38
+ πŸ“ˆ Evaluation Metrics
39
+ Metric Value
40
+ Accuracy 0.9980
41
+ Precision ~0.9980
42
+ Recall ~0.9980
43
+ F1-score ~0.9980
44
+ βœ… Note: These metrics indicate strong performance across all attack types, with minimal misclassifications.
45
+
46
+ πŸ“Š Confusion Matrix & Classification Report
47
+ Confusion Matrix: Dominant diagonal, indicating high true positive rates
48
+
49
+ Classification Report: High precision, recall, and F1-scores for most attack classes
50
+
51
+ πŸ” Feature Importance
52
+ Top features identified using Random Forest’s feature_importances_ attribute
53
+
54
+ Further analysis of top 10–15 features recommended to understand key attack indicators
55
+
56
+ Feature names available via mapping from SelectKBest output
57
+
58
+ πŸš€ How to Use
59
+ python
60
+ from cyberattackclassifier import AttackModel
61
+
62
+ model = AttackModel.load_pretrained("your-huggingface-username/cyberattackclassifier-v1")
63
+ input_data = {
64
+ "Firewall Logs": "Unknown",
65
+ "Proxy Information": "Blocked",
66
+ "IDS/IPS Alerts": "High",
67
+ ...
68
+ }
69
+ prediction = model.predict(input_data)
70
+ ⚠️ Limitations
71
+ Imbalanced Data Risk: Ensure attack types are well-represented in training data
72
+
73
+ Feature Drift: Model may degrade if log formats or attack patterns evolve
74
+
75
+ Interpretability: Random Forests are less interpretable than linear models; use feature importance tools
76
+
77
+ πŸ“„ License
78
+ Apache 2.0 or MIT License (choose based on your preference)
79
+
80
+ πŸ‘€ Author
81
+ Created by [Your Name or Organization]
82
+
83
+ πŸ“š Recommendations for Open-Sourcing
84
+ Include preprocessing pipeline (imputation, encoding, scaling)
85
+
86
+ Provide training and evaluation scripts
87
+
88
+ Share feature importance analysis and mapping
89
+
90
+ Document attack type taxonomy used in classification