File size: 3,473 Bytes
d364315
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
---
license: mit
tags:
- log-analysis
- anomaly-detection
- bert
- cybersecurity
- multiclass-classification
language:
- en
datasets:
- custom-log-dataset
metrics:
- f1
- accuracy
pipeline_tag: text-classification
---

# Log Anomaly Detection Models

This repository contains trained models for the **Log Anomaly Detection System** that classifies system logs into 7 anomaly categories.

## πŸ€– Available Models

### BERT-based Models
- **DANN-BERT** (`models/DANN-BERT-Log-Anomaly-Detection/`) - Domain-Adversarial Neural Network
- **LoRA-BERT** (`models/LoRA-BERT-Log-Anomaly-Detection/`) - Low-Rank Adaptation  
- **Hybrid-BERT** (`models/Hybrid-BERT-Log-Anomaly-Detection/`) - BERT + Template Features

### Traditional ML Models
- **XGBoost** (`models/XGBoost-Log-Anomaly-Detection/`) - Gradient Boosting Classifier

## πŸ“Š Model Performance

| Model | F1-Score (Macro) | Accuracy | Parameters |
|-------|-----------------|----------|------------|
| Hybrid-BERT | **92.8%** | **94.3%** | 110M |
| DANN-BERT | 90.3% | 92.1% | 110M |
| LoRA-BERT | 88.7% | 90.5% | 1.5M (trainable) |
| XGBoost | 88.5% | 91.2% | - |

## 🎯 Classification Categories

1. **Normal** (0): Benign operations
2. **Security Anomaly** (1): Authentication failures, unauthorized access
3. **System Failure** (2): Crashes, kernel panics
4. **Performance Issue** (3): Timeouts, slow responses
5. **Network Anomaly** (4): Connection errors, packet loss
6. **Config Error** (5): Misconfigurations, invalid settings
7. **Hardware Issue** (6): Disk failures, memory errors

## πŸš€ Usage

### Download Models

```python
from huggingface_hub import hf_hub_download

# Download BERT model
model_path = hf_hub_download(
    repo_id="krishnas4415/log-anomaly-detection-models",
    filename="models/Hybrid-BERT-Log-Anomaly-Detection/pytorch_model.pt"
)

# Download XGBoost model
xgb_path = hf_hub_download(
    repo_id="krishnas4415/log-anomaly-detection-models", 
    filename="models/XGBoost-Log-Anomaly-Detection/best_mod.pkl"
)
```

### Load and Use Models

```python
import torch
import pickle
from transformers import AutoTokenizer

# Load BERT model
model = torch.load(model_path)
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')

# Load XGBoost model
with open(xgb_path, 'rb') as f:
    xgb_model = pickle.load(f)

# Example prediction
log_text = "Apr 15 12:34:56 server sshd[1234]: Failed password for admin"
inputs = tokenizer(log_text, return_tensors='pt', max_length=128, truncation=True, padding=True)

with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.softmax(outputs.logits, dim=-1)
    predicted_class = torch.argmax(predictions, dim=-1)
```

## πŸ“š Training Data

- **Sources**: 16 log types (Apache, SSH, Hadoop, HDFS, Linux, Windows, etc.)
- **Size**: ~32,000 labeled logs
- **Classes**: 7 anomaly categories
- **Features**: BERT embeddings + template features + statistical features

## πŸ”— Related Links

- **Main Project**: [Log Anomaly Detection System](https://github.com/krishnasharma4415/log-anomaly-detection)
- **Live Demo**: [Frontend Application](https://log-anomaly-frontend.vercel.app)
- **API**: [Backend API](https://log-anomaly-api.onrender.com)

## πŸ“„ Citation

```bibtex
@misc{log-anomaly-detection-2024,
  title={Log Anomaly Detection System},
  author={Krishna Sharma},
  year={2024},
  url={https://github.com/krishnasharma4415/log-anomaly-detection}
}
```

## πŸ“ License

MIT License - see LICENSE file for details.