Na-Rajan
/

CyberAttackClassifier

Model card Files Files and versions

CyberAttackClassifier / README.md

Na-Rajan's picture

Update README.md

dea39de verified 7 months ago

|

history blame contribute delete

3.09 kB

	---
	license: mit
	---
	Model Name
	CyberAttackClassifier V1 – A Random Forest-based model for classifying cybersecurity attacks using network and system log data.

	📖 Overview
	CyberAttackClassifier V1 is a supervised machine learning model trained to classify various types of cybersecurity attacks based on structured log and alert data. It uses a Random Forest Classifier trained on a feature-selected dataset, achieving near-perfect performance across multiple evaluation metrics.

	🔍 Intended Uses
	Threat Detection: Automatically classify incoming events or logs into known attack categories.

	Security Monitoring: Enhance SIEM systems with predictive capabilities.

	Feature Analysis: Identify key indicators and patterns associated with different attack types.

	Incident Response Prioritization: Quickly assess and categorize threats for faster triage.

	🧠 Model Architecture
	Attribute Value
	Model Type Random Forest Classifier
	Framework scikit-learn
	Input Shape (raw) (100000, 197)
	Input Shape (selected) (100000, 50)
	Feature Selection SelectKBest (f_classif)
	Categorical Imputation 'Unknown' for missing values
	Encoding One-hot for categorical features
	Scaling StandardScaler for numerical features
	📚 Training Details
	Dataset Size: 100,000 samples

	Missing Values: Imputed in object-type columns

	Feature Selection: Top 50 features selected using ANOVA F-test

	Train/Test Split: Standard split (e.g., 80/20 or stratified)

	📈 Evaluation Metrics
	Metric Value
	Accuracy 0.9980
	Precision ~0.9980
	Recall ~0.9980
	F1-score ~0.9980
	✅ Note: These metrics indicate strong performance across all attack types, with minimal misclassifications.

	📊 Confusion Matrix & Classification Report
	Confusion Matrix: Dominant diagonal, indicating high true positive rates

	Classification Report: High precision, recall, and F1-scores for most attack classes

	🔍 Feature Importance
	Top features identified using Random Forest’s feature_importances_ attribute

	Further analysis of top 10–15 features recommended to understand key attack indicators

	Feature names available via mapping from SelectKBest output

	🚀 How to Use
	python
	from cyberattackclassifier import AttackModel

	model = AttackModel.load_pretrained("your-huggingface-username/cyberattackclassifier-v1")
	input_data = {
	"Firewall Logs": "Unknown",
	"Proxy Information": "Blocked",
	"IDS/IPS Alerts": "High",
	...
	}
	prediction = model.predict(input_data)
	⚠️ Limitations
	Imbalanced Data Risk: Ensure attack types are well-represented in training data

	Feature Drift: Model may degrade if log formats or attack patterns evolve

	Interpretability: Random Forests are less interpretable than linear models; use feature importance tools

	📄 License
	Apache 2.0 or MIT License (choose based on your preference)

	👤 Author
	Created by [Your Name or Organization]

	📚 Recommendations for Open-Sourcing
	Include preprocessing pipeline (imputation, encoding, scaling)

	Provide training and evaluation scripts

	Share feature importance analysis and mapping

	Document attack type taxonomy used in classification