|
|
--- |
|
|
license: mit |
|
|
tags: |
|
|
- log-analysis |
|
|
- anomaly-detection |
|
|
- bert |
|
|
- cybersecurity |
|
|
- multiclass-classification |
|
|
language: |
|
|
- en |
|
|
datasets: |
|
|
- custom-log-dataset |
|
|
metrics: |
|
|
- f1 |
|
|
- accuracy |
|
|
pipeline_tag: text-classification |
|
|
--- |
|
|
|
|
|
# Log Anomaly Detection Models |
|
|
|
|
|
This repository contains trained models for the **Log Anomaly Detection System** that classifies system logs into 7 anomaly categories. |
|
|
|
|
|
## π€ Available Models |
|
|
|
|
|
### BERT-based Models |
|
|
- **DANN-BERT** (`models/DANN-BERT-Log-Anomaly-Detection/`) - Domain-Adversarial Neural Network |
|
|
- **LoRA-BERT** (`models/LoRA-BERT-Log-Anomaly-Detection/`) - Low-Rank Adaptation |
|
|
- **Hybrid-BERT** (`models/Hybrid-BERT-Log-Anomaly-Detection/`) - BERT + Template Features |
|
|
|
|
|
### Traditional ML Models |
|
|
- **XGBoost** (`models/XGBoost-Log-Anomaly-Detection/`) - Gradient Boosting Classifier |
|
|
|
|
|
## π Model Performance |
|
|
|
|
|
| Model | F1-Score (Macro) | Accuracy | Parameters | |
|
|
|-------|-----------------|----------|------------| |
|
|
| Hybrid-BERT | **92.8%** | **94.3%** | 110M | |
|
|
| DANN-BERT | 90.3% | 92.1% | 110M | |
|
|
| LoRA-BERT | 88.7% | 90.5% | 1.5M (trainable) | |
|
|
| XGBoost | 88.5% | 91.2% | - | |
|
|
|
|
|
## π― Classification Categories |
|
|
|
|
|
1. **Normal** (0): Benign operations |
|
|
2. **Security Anomaly** (1): Authentication failures, unauthorized access |
|
|
3. **System Failure** (2): Crashes, kernel panics |
|
|
4. **Performance Issue** (3): Timeouts, slow responses |
|
|
5. **Network Anomaly** (4): Connection errors, packet loss |
|
|
6. **Config Error** (5): Misconfigurations, invalid settings |
|
|
7. **Hardware Issue** (6): Disk failures, memory errors |
|
|
|
|
|
## π Usage |
|
|
|
|
|
### Download Models |
|
|
|
|
|
```python |
|
|
from huggingface_hub import hf_hub_download |
|
|
|
|
|
# Download BERT model |
|
|
model_path = hf_hub_download( |
|
|
repo_id="krishnas4415/log-anomaly-detection-models", |
|
|
filename="models/Hybrid-BERT-Log-Anomaly-Detection/pytorch_model.pt" |
|
|
) |
|
|
|
|
|
# Download XGBoost model |
|
|
xgb_path = hf_hub_download( |
|
|
repo_id="krishnas4415/log-anomaly-detection-models", |
|
|
filename="models/XGBoost-Log-Anomaly-Detection/best_mod.pkl" |
|
|
) |
|
|
``` |
|
|
|
|
|
### Load and Use Models |
|
|
|
|
|
```python |
|
|
import torch |
|
|
import pickle |
|
|
from transformers import AutoTokenizer |
|
|
|
|
|
# Load BERT model |
|
|
model = torch.load(model_path) |
|
|
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased') |
|
|
|
|
|
# Load XGBoost model |
|
|
with open(xgb_path, 'rb') as f: |
|
|
xgb_model = pickle.load(f) |
|
|
|
|
|
# Example prediction |
|
|
log_text = "Apr 15 12:34:56 server sshd[1234]: Failed password for admin" |
|
|
inputs = tokenizer(log_text, return_tensors='pt', max_length=128, truncation=True, padding=True) |
|
|
|
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
predictions = torch.softmax(outputs.logits, dim=-1) |
|
|
predicted_class = torch.argmax(predictions, dim=-1) |
|
|
``` |
|
|
|
|
|
## π Training Data |
|
|
|
|
|
- **Sources**: 16 log types (Apache, SSH, Hadoop, HDFS, Linux, Windows, etc.) |
|
|
- **Size**: ~32,000 labeled logs |
|
|
- **Classes**: 7 anomaly categories |
|
|
- **Features**: BERT embeddings + template features + statistical features |
|
|
|
|
|
## π Related Links |
|
|
|
|
|
- **Main Project**: [Log Anomaly Detection System](https://github.com/krishnasharma4415/log-anomaly-detection) |
|
|
- **Live Demo**: [Frontend Application](https://log-anomaly-frontend.vercel.app) |
|
|
- **API**: [Backend API](https://log-anomaly-api.onrender.com) |
|
|
|
|
|
## π Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{log-anomaly-detection-2024, |
|
|
title={Log Anomaly Detection System}, |
|
|
author={Krishna Sharma}, |
|
|
year={2024}, |
|
|
url={https://github.com/krishnasharma4415/log-anomaly-detection} |
|
|
} |
|
|
``` |
|
|
|
|
|
## π License |
|
|
|
|
|
MIT License - see LICENSE file for details. |
|
|
|