|
|
--- |
|
|
license: mit |
|
|
--- |
|
|
Model Name |
|
|
CyberAttackClassifier V1 β A Random Forest-based model for classifying cybersecurity attacks using network and system log data. |
|
|
|
|
|
π Overview |
|
|
CyberAttackClassifier V1 is a supervised machine learning model trained to classify various types of cybersecurity attacks based on structured log and alert data. It uses a Random Forest Classifier trained on a feature-selected dataset, achieving near-perfect performance across multiple evaluation metrics. |
|
|
|
|
|
π Intended Uses |
|
|
Threat Detection: Automatically classify incoming events or logs into known attack categories. |
|
|
|
|
|
Security Monitoring: Enhance SIEM systems with predictive capabilities. |
|
|
|
|
|
Feature Analysis: Identify key indicators and patterns associated with different attack types. |
|
|
|
|
|
Incident Response Prioritization: Quickly assess and categorize threats for faster triage. |
|
|
|
|
|
π§ Model Architecture |
|
|
Attribute Value |
|
|
Model Type Random Forest Classifier |
|
|
Framework scikit-learn |
|
|
Input Shape (raw) (100000, 197) |
|
|
Input Shape (selected) (100000, 50) |
|
|
Feature Selection SelectKBest (f_classif) |
|
|
Categorical Imputation 'Unknown' for missing values |
|
|
Encoding One-hot for categorical features |
|
|
Scaling StandardScaler for numerical features |
|
|
π Training Details |
|
|
Dataset Size: 100,000 samples |
|
|
|
|
|
Missing Values: Imputed in object-type columns |
|
|
|
|
|
Feature Selection: Top 50 features selected using ANOVA F-test |
|
|
|
|
|
Train/Test Split: Standard split (e.g., 80/20 or stratified) |
|
|
|
|
|
π Evaluation Metrics |
|
|
Metric Value |
|
|
Accuracy 0.9980 |
|
|
Precision ~0.9980 |
|
|
Recall ~0.9980 |
|
|
F1-score ~0.9980 |
|
|
β
Note: These metrics indicate strong performance across all attack types, with minimal misclassifications. |
|
|
|
|
|
π Confusion Matrix & Classification Report |
|
|
Confusion Matrix: Dominant diagonal, indicating high true positive rates |
|
|
|
|
|
Classification Report: High precision, recall, and F1-scores for most attack classes |
|
|
|
|
|
π Feature Importance |
|
|
Top features identified using Random Forestβs feature_importances_ attribute |
|
|
|
|
|
Further analysis of top 10β15 features recommended to understand key attack indicators |
|
|
|
|
|
Feature names available via mapping from SelectKBest output |
|
|
|
|
|
π How to Use |
|
|
python |
|
|
from cyberattackclassifier import AttackModel |
|
|
|
|
|
model = AttackModel.load_pretrained("your-huggingface-username/cyberattackclassifier-v1") |
|
|
input_data = { |
|
|
"Firewall Logs": "Unknown", |
|
|
"Proxy Information": "Blocked", |
|
|
"IDS/IPS Alerts": "High", |
|
|
... |
|
|
} |
|
|
prediction = model.predict(input_data) |
|
|
β οΈ Limitations |
|
|
Imbalanced Data Risk: Ensure attack types are well-represented in training data |
|
|
|
|
|
Feature Drift: Model may degrade if log formats or attack patterns evolve |
|
|
|
|
|
Interpretability: Random Forests are less interpretable than linear models; use feature importance tools |
|
|
|
|
|
π License |
|
|
Apache 2.0 or MIT License (choose based on your preference) |
|
|
|
|
|
π€ Author |
|
|
Created by [Your Name or Organization] |
|
|
|
|
|
π Recommendations for Open-Sourcing |
|
|
Include preprocessing pipeline (imputation, encoding, scaling) |
|
|
|
|
|
Provide training and evaluation scripts |
|
|
|
|
|
Share feature importance analysis and mapping |
|
|
|
|
|
Document attack type taxonomy used in classification |