File size: 2,447 Bytes
65f5801
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
---

license: mit
tags:
- log-analysis
- anomaly-detection
- bert
- cybersecurity
- multiclass-classification
language:
- en
datasets:
- custom-log-dataset
metrics:
- f1
- accuracy
pipeline_tag: text-classification
---


# DANN-BERT-Log-Anomaly-Detection - Log Anomaly Detection

This model is part of the **Log Anomaly Detection System** that classifies system logs into 7 anomaly categories.

## Model Description

DANN-BERT-Log-Anomaly-Detection is a Domain-Adversarial Neural Network BERT model fine-tuned for multi-class log anomaly detection. It can classify logs from 16+ different sources (Apache, SSH, Hadoop, etc.) into 7 categories:

1. **Normal** (0): Benign operations
2. **Security Anomaly** (1): Authentication failures, unauthorized access
3. **System Failure** (2): Crashes, kernel panics  
4. **Performance Issue** (3): Timeouts, slow responses
5. **Network Anomaly** (4): Connection errors, packet loss
6. **Config Error** (5): Misconfigurations, invalid settings
7. **Hardware Issue** (6): Disk failures, memory errors


## Performance Metrics

- **F1-Score (Macro)**: 0.903
- **Accuracy**: 0.921
- **Model Type**: Domain-Adversarial Neural Network BERT
- **Classes**: 7 (normal, security_anomaly, system_failure, performance_issue, network_anomaly, config_error, hardware_issue)


## Usage

```python

import torch

from transformers import AutoTokenizer, AutoModel



# Load the model

model = torch.load('model.pt')

tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')



# Example usage

log_text = "Apr 15 12:34:56 server sshd[1234]: Failed password for admin"

inputs = tokenizer(log_text, return_tensors='pt', max_length=128, truncation=True, padding=True)



with torch.no_grad():

    outputs = model(**inputs)

    predictions = torch.softmax(outputs.logits, dim=-1)

    predicted_class = torch.argmax(predictions, dim=-1)

```

## Training Data

- **Sources**: 16 log types (Apache, SSH, Hadoop, HDFS, Linux, Windows, etc.)
- **Size**: ~32,000 labeled logs
- **Classes**: 7 anomaly categories
- **Features**: BERT embeddings + template features + statistical features

## Citation

```bibtex

@misc{log-anomaly-detection-2024,

  title={Log Anomaly Detection System},

  author={Krishna Sharma},

  year={2024},

  url={https://github.com/krishnasharma4415/log-anomaly-detection}

}

```

## License

MIT License - see LICENSE file for details.