File size: 3,085 Bytes
dea39de |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
---
license: mit
---
Model Name
CyberAttackClassifier V1 β A Random Forest-based model for classifying cybersecurity attacks using network and system log data.
π Overview
CyberAttackClassifier V1 is a supervised machine learning model trained to classify various types of cybersecurity attacks based on structured log and alert data. It uses a Random Forest Classifier trained on a feature-selected dataset, achieving near-perfect performance across multiple evaluation metrics.
π Intended Uses
Threat Detection: Automatically classify incoming events or logs into known attack categories.
Security Monitoring: Enhance SIEM systems with predictive capabilities.
Feature Analysis: Identify key indicators and patterns associated with different attack types.
Incident Response Prioritization: Quickly assess and categorize threats for faster triage.
π§ Model Architecture
Attribute Value
Model Type Random Forest Classifier
Framework scikit-learn
Input Shape (raw) (100000, 197)
Input Shape (selected) (100000, 50)
Feature Selection SelectKBest (f_classif)
Categorical Imputation 'Unknown' for missing values
Encoding One-hot for categorical features
Scaling StandardScaler for numerical features
π Training Details
Dataset Size: 100,000 samples
Missing Values: Imputed in object-type columns
Feature Selection: Top 50 features selected using ANOVA F-test
Train/Test Split: Standard split (e.g., 80/20 or stratified)
π Evaluation Metrics
Metric Value
Accuracy 0.9980
Precision ~0.9980
Recall ~0.9980
F1-score ~0.9980
β
Note: These metrics indicate strong performance across all attack types, with minimal misclassifications.
π Confusion Matrix & Classification Report
Confusion Matrix: Dominant diagonal, indicating high true positive rates
Classification Report: High precision, recall, and F1-scores for most attack classes
π Feature Importance
Top features identified using Random Forestβs feature_importances_ attribute
Further analysis of top 10β15 features recommended to understand key attack indicators
Feature names available via mapping from SelectKBest output
π How to Use
python
from cyberattackclassifier import AttackModel
model = AttackModel.load_pretrained("your-huggingface-username/cyberattackclassifier-v1")
input_data = {
"Firewall Logs": "Unknown",
"Proxy Information": "Blocked",
"IDS/IPS Alerts": "High",
...
}
prediction = model.predict(input_data)
β οΈ Limitations
Imbalanced Data Risk: Ensure attack types are well-represented in training data
Feature Drift: Model may degrade if log formats or attack patterns evolve
Interpretability: Random Forests are less interpretable than linear models; use feature importance tools
π License
Apache 2.0 or MIT License (choose based on your preference)
π€ Author
Created by [Your Name or Organization]
π Recommendations for Open-Sourcing
Include preprocessing pipeline (imputation, encoding, scaling)
Provide training and evaluation scripts
Share feature importance analysis and mapping
Document attack type taxonomy used in classification |