File size: 3,085 Bytes
dea39de
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
---
license: mit
---
Model Name
CyberAttackClassifier V1 – A Random Forest-based model for classifying cybersecurity attacks using network and system log data.

πŸ“– Overview
CyberAttackClassifier V1 is a supervised machine learning model trained to classify various types of cybersecurity attacks based on structured log and alert data. It uses a Random Forest Classifier trained on a feature-selected dataset, achieving near-perfect performance across multiple evaluation metrics.

πŸ” Intended Uses
Threat Detection: Automatically classify incoming events or logs into known attack categories.

Security Monitoring: Enhance SIEM systems with predictive capabilities.

Feature Analysis: Identify key indicators and patterns associated with different attack types.

Incident Response Prioritization: Quickly assess and categorize threats for faster triage.

🧠 Model Architecture
Attribute	Value
Model Type	Random Forest Classifier
Framework	scikit-learn
Input Shape (raw)	(100000, 197)
Input Shape (selected)	(100000, 50)
Feature Selection	SelectKBest (f_classif)
Categorical Imputation	'Unknown' for missing values
Encoding	One-hot for categorical features
Scaling	StandardScaler for numerical features
πŸ“š Training Details
Dataset Size: 100,000 samples

Missing Values: Imputed in object-type columns

Feature Selection: Top 50 features selected using ANOVA F-test

Train/Test Split: Standard split (e.g., 80/20 or stratified)

πŸ“ˆ Evaluation Metrics
Metric	Value
Accuracy	0.9980
Precision	~0.9980
Recall	~0.9980
F1-score	~0.9980
βœ… Note: These metrics indicate strong performance across all attack types, with minimal misclassifications.

πŸ“Š Confusion Matrix & Classification Report
Confusion Matrix: Dominant diagonal, indicating high true positive rates

Classification Report: High precision, recall, and F1-scores for most attack classes

πŸ” Feature Importance
Top features identified using Random Forest’s feature_importances_ attribute

Further analysis of top 10–15 features recommended to understand key attack indicators

Feature names available via mapping from SelectKBest output

πŸš€ How to Use
python
from cyberattackclassifier import AttackModel

model = AttackModel.load_pretrained("your-huggingface-username/cyberattackclassifier-v1")
input_data = {
    "Firewall Logs": "Unknown",
    "Proxy Information": "Blocked",
    "IDS/IPS Alerts": "High",
    ...
}
prediction = model.predict(input_data)
⚠️ Limitations
Imbalanced Data Risk: Ensure attack types are well-represented in training data

Feature Drift: Model may degrade if log formats or attack patterns evolve

Interpretability: Random Forests are less interpretable than linear models; use feature importance tools

πŸ“„ License
Apache 2.0 or MIT License (choose based on your preference)

πŸ‘€ Author
Created by [Your Name or Organization]

πŸ“š Recommendations for Open-Sourcing
Include preprocessing pipeline (imputation, encoding, scaling)

Provide training and evaluation scripts

Share feature importance analysis and mapping

Document attack type taxonomy used in classification