Na-Rajan
/

CyberAttackClassifier

Model card Files Files and versions

Na-Rajan commited on Jul 27, 2025

Commit

dea39de

·

verified ·

1 Parent(s): 2550d31

Update README.md

Files changed (1) hide show

README.md +90 -3

README.md CHANGED Viewed

@@ -1,3 +1,90 @@
----
-license: mit
----

+---
+license: mit
+---
+Model Name
+CyberAttackClassifier V1 – A Random Forest-based model for classifying cybersecurity attacks using network and system log data.
+📖 Overview
+CyberAttackClassifier V1 is a supervised machine learning model trained to classify various types of cybersecurity attacks based on structured log and alert data. It uses a Random Forest Classifier trained on a feature-selected dataset, achieving near-perfect performance across multiple evaluation metrics.
+🔍 Intended Uses
+Threat Detection: Automatically classify incoming events or logs into known attack categories.
+Security Monitoring: Enhance SIEM systems with predictive capabilities.
+Feature Analysis: Identify key indicators and patterns associated with different attack types.
+Incident Response Prioritization: Quickly assess and categorize threats for faster triage.
+🧠 Model Architecture
+Attribute	Value
+Model Type	Random Forest Classifier
+Framework	scikit-learn
+Input Shape (raw)	(100000, 197)
+Input Shape (selected)	(100000, 50)
+Feature Selection	SelectKBest (f_classif)
+Categorical Imputation	'Unknown' for missing values
+Encoding	One-hot for categorical features
+Scaling	StandardScaler for numerical features
+📚 Training Details
+Dataset Size: 100,000 samples
+Missing Values: Imputed in object-type columns
+Feature Selection: Top 50 features selected using ANOVA F-test
+Train/Test Split: Standard split (e.g., 80/20 or stratified)
+📈 Evaluation Metrics
+Metric	Value
+Accuracy	0.9980
+Precision	~0.9980
+Recall	~0.9980
+F1-score	~0.9980
+✅ Note: These metrics indicate strong performance across all attack types, with minimal misclassifications.
+📊 Confusion Matrix & Classification Report
+Confusion Matrix: Dominant diagonal, indicating high true positive rates
+Classification Report: High precision, recall, and F1-scores for most attack classes
+🔍 Feature Importance
+Top features identified using Random Forest’s feature_importances_ attribute
+Further analysis of top 10–15 features recommended to understand key attack indicators
+Feature names available via mapping from SelectKBest output
+🚀 How to Use
+python
+from cyberattackclassifier import AttackModel
+model = AttackModel.load_pretrained("your-huggingface-username/cyberattackclassifier-v1")
+input_data = {
+    "Firewall Logs": "Unknown",
+    "Proxy Information": "Blocked",
+    "IDS/IPS Alerts": "High",
+    ...
+}
+prediction = model.predict(input_data)
+⚠️ Limitations
+Imbalanced Data Risk: Ensure attack types are well-represented in training data
+Feature Drift: Model may degrade if log formats or attack patterns evolve
+Interpretability: Random Forests are less interpretable than linear models; use feature importance tools
+📄 License
+Apache 2.0 or MIT License (choose based on your preference)
+👤 Author
+Created by [Your Name or Organization]
+📚 Recommendations for Open-Sourcing
+Include preprocessing pipeline (imputation, encoding, scaling)
+Provide training and evaluation scripts
+Share feature importance analysis and mapping
+Document attack type taxonomy used in classification