--- license: mit --- Model Name CyberAttackClassifier V1 – A Random Forest-based model for classifying cybersecurity attacks using network and system log data. πŸ“– Overview CyberAttackClassifier V1 is a supervised machine learning model trained to classify various types of cybersecurity attacks based on structured log and alert data. It uses a Random Forest Classifier trained on a feature-selected dataset, achieving near-perfect performance across multiple evaluation metrics. πŸ” Intended Uses Threat Detection: Automatically classify incoming events or logs into known attack categories. Security Monitoring: Enhance SIEM systems with predictive capabilities. Feature Analysis: Identify key indicators and patterns associated with different attack types. Incident Response Prioritization: Quickly assess and categorize threats for faster triage. 🧠 Model Architecture Attribute Value Model Type Random Forest Classifier Framework scikit-learn Input Shape (raw) (100000, 197) Input Shape (selected) (100000, 50) Feature Selection SelectKBest (f_classif) Categorical Imputation 'Unknown' for missing values Encoding One-hot for categorical features Scaling StandardScaler for numerical features πŸ“š Training Details Dataset Size: 100,000 samples Missing Values: Imputed in object-type columns Feature Selection: Top 50 features selected using ANOVA F-test Train/Test Split: Standard split (e.g., 80/20 or stratified) πŸ“ˆ Evaluation Metrics Metric Value Accuracy 0.9980 Precision ~0.9980 Recall ~0.9980 F1-score ~0.9980 βœ… Note: These metrics indicate strong performance across all attack types, with minimal misclassifications. πŸ“Š Confusion Matrix & Classification Report Confusion Matrix: Dominant diagonal, indicating high true positive rates Classification Report: High precision, recall, and F1-scores for most attack classes πŸ” Feature Importance Top features identified using Random Forest’s feature_importances_ attribute Further analysis of top 10–15 features recommended to understand key attack indicators Feature names available via mapping from SelectKBest output πŸš€ How to Use python from cyberattackclassifier import AttackModel model = AttackModel.load_pretrained("your-huggingface-username/cyberattackclassifier-v1") input_data = { "Firewall Logs": "Unknown", "Proxy Information": "Blocked", "IDS/IPS Alerts": "High", ... } prediction = model.predict(input_data) ⚠️ Limitations Imbalanced Data Risk: Ensure attack types are well-represented in training data Feature Drift: Model may degrade if log formats or attack patterns evolve Interpretability: Random Forests are less interpretable than linear models; use feature importance tools πŸ“„ License Apache 2.0 or MIT License (choose based on your preference) πŸ‘€ Author Created by [Your Name or Organization] πŸ“š Recommendations for Open-Sourcing Include preprocessing pipeline (imputation, encoding, scaling) Provide training and evaluation scripts Share feature importance analysis and mapping Document attack type taxonomy used in classification