Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,90 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: mit
|
| 3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
---
|
| 4 |
+
Model Name
|
| 5 |
+
CyberAttackClassifier V1 β A Random Forest-based model for classifying cybersecurity attacks using network and system log data.
|
| 6 |
+
|
| 7 |
+
π Overview
|
| 8 |
+
CyberAttackClassifier V1 is a supervised machine learning model trained to classify various types of cybersecurity attacks based on structured log and alert data. It uses a Random Forest Classifier trained on a feature-selected dataset, achieving near-perfect performance across multiple evaluation metrics.
|
| 9 |
+
|
| 10 |
+
π Intended Uses
|
| 11 |
+
Threat Detection: Automatically classify incoming events or logs into known attack categories.
|
| 12 |
+
|
| 13 |
+
Security Monitoring: Enhance SIEM systems with predictive capabilities.
|
| 14 |
+
|
| 15 |
+
Feature Analysis: Identify key indicators and patterns associated with different attack types.
|
| 16 |
+
|
| 17 |
+
Incident Response Prioritization: Quickly assess and categorize threats for faster triage.
|
| 18 |
+
|
| 19 |
+
π§ Model Architecture
|
| 20 |
+
Attribute Value
|
| 21 |
+
Model Type Random Forest Classifier
|
| 22 |
+
Framework scikit-learn
|
| 23 |
+
Input Shape (raw) (100000, 197)
|
| 24 |
+
Input Shape (selected) (100000, 50)
|
| 25 |
+
Feature Selection SelectKBest (f_classif)
|
| 26 |
+
Categorical Imputation 'Unknown' for missing values
|
| 27 |
+
Encoding One-hot for categorical features
|
| 28 |
+
Scaling StandardScaler for numerical features
|
| 29 |
+
π Training Details
|
| 30 |
+
Dataset Size: 100,000 samples
|
| 31 |
+
|
| 32 |
+
Missing Values: Imputed in object-type columns
|
| 33 |
+
|
| 34 |
+
Feature Selection: Top 50 features selected using ANOVA F-test
|
| 35 |
+
|
| 36 |
+
Train/Test Split: Standard split (e.g., 80/20 or stratified)
|
| 37 |
+
|
| 38 |
+
π Evaluation Metrics
|
| 39 |
+
Metric Value
|
| 40 |
+
Accuracy 0.9980
|
| 41 |
+
Precision ~0.9980
|
| 42 |
+
Recall ~0.9980
|
| 43 |
+
F1-score ~0.9980
|
| 44 |
+
β
Note: These metrics indicate strong performance across all attack types, with minimal misclassifications.
|
| 45 |
+
|
| 46 |
+
π Confusion Matrix & Classification Report
|
| 47 |
+
Confusion Matrix: Dominant diagonal, indicating high true positive rates
|
| 48 |
+
|
| 49 |
+
Classification Report: High precision, recall, and F1-scores for most attack classes
|
| 50 |
+
|
| 51 |
+
π Feature Importance
|
| 52 |
+
Top features identified using Random Forestβs feature_importances_ attribute
|
| 53 |
+
|
| 54 |
+
Further analysis of top 10β15 features recommended to understand key attack indicators
|
| 55 |
+
|
| 56 |
+
Feature names available via mapping from SelectKBest output
|
| 57 |
+
|
| 58 |
+
π How to Use
|
| 59 |
+
python
|
| 60 |
+
from cyberattackclassifier import AttackModel
|
| 61 |
+
|
| 62 |
+
model = AttackModel.load_pretrained("your-huggingface-username/cyberattackclassifier-v1")
|
| 63 |
+
input_data = {
|
| 64 |
+
"Firewall Logs": "Unknown",
|
| 65 |
+
"Proxy Information": "Blocked",
|
| 66 |
+
"IDS/IPS Alerts": "High",
|
| 67 |
+
...
|
| 68 |
+
}
|
| 69 |
+
prediction = model.predict(input_data)
|
| 70 |
+
β οΈ Limitations
|
| 71 |
+
Imbalanced Data Risk: Ensure attack types are well-represented in training data
|
| 72 |
+
|
| 73 |
+
Feature Drift: Model may degrade if log formats or attack patterns evolve
|
| 74 |
+
|
| 75 |
+
Interpretability: Random Forests are less interpretable than linear models; use feature importance tools
|
| 76 |
+
|
| 77 |
+
π License
|
| 78 |
+
Apache 2.0 or MIT License (choose based on your preference)
|
| 79 |
+
|
| 80 |
+
π€ Author
|
| 81 |
+
Created by [Your Name or Organization]
|
| 82 |
+
|
| 83 |
+
π Recommendations for Open-Sourcing
|
| 84 |
+
Include preprocessing pipeline (imputation, encoding, scaling)
|
| 85 |
+
|
| 86 |
+
Provide training and evaluation scripts
|
| 87 |
+
|
| 88 |
+
Share feature importance analysis and mapping
|
| 89 |
+
|
| 90 |
+
Document attack type taxonomy used in classification
|