File size: 4,510 Bytes
790be67 5dfdfb0 790be67 5dfdfb0 790be67 7b9714c 790be67 8897ab8 7b9714c 790be67 d58ab68 790be67 8897ab8 0d2e61a 8897ab8 03c8396 8897ab8 03c8396 790be67 557c5bb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
---
license: apache-2.0
base_model: distilbert-base-uncased
tags:
- generated_from_trainer
metrics:
- accuracy
- f1
model-index:
- name: distilbert-base-uncased-logline-v3
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# distilbert-base-uncased-logline-v3
This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on the AIT Log Data Set V2.0 dataset<sup>1</sup>, https://zenodo.org/records/5789064.
It achieves the following results on the evaluation set:
- Loss: 0.0022
- Accuracy: 0.9995
- F1: 0.9994
## Model description
This model is meant for text classification of log files for network intrusion detection. The python package that runs this model can be found here -> https://github.com/Isaacwilliam4/INSyT.
As mentioned on their site, this model was trained on the following logs: Apache access and error logs, authentication logs, DNS logs, VPN logs, audit logs, Suricata logs, network traffic packet captures, horde logs, exim logs, syslog, and system monitoring logs.
## Labels
| Label | Label Name |
|-------|---------------------------------------------------------------------|
| 0 | attacker:dnsteal:dnsteal-dropped |
| 1 | attacker:dnsteal:dnsteal-received |
| 2 | attacker:dnsteal:exfiltration-service |
| 3 | attacker_change_user:escalate |
| 4 | attacker_change_user:escalate:escalated_command:escalated_sudo_command |
| 5 | attacker_http:dirb:foothold |
| 6 | attacker_http:foothold:service_scan |
| 7 | attacker_http:foothold:webshell_cmd |
| 8 | attacker_http:foothold:webshell_upload |
| 9 | attacker_http:foothold:wpscan |
| 10 | attacker_vpn:escalate |
| 11 | attacker_vpn:foothold |
| 12 | benign |
| 13 | crack_passwords:escalate |
| 14 | dirb:foothold |
| 15 | dns_scan:foothold |
| 16 | escalate:escalated_command:escalated_sudo_command |
| 17 | escalate:escalated_command:escalated_sudo_command:escalated_sudo_session |
| 18 | escalate:webshell_cmd |
| 19 | foothold:network_scan |
| 20 | foothold:service_scan |
| 21 | foothold:traceroute |
| 22 | foothold:wpscan |
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 3
### Training results
| Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 |
|:-------------:|:-----:|:-----:|:---------------:|:--------:|:------:|
| 0.0435 | 1.0 | 6274 | 0.0120 | 0.9965 | 0.9965 |
| 0.0059 | 2.0 | 12548 | 0.0032 | 0.9993 | 0.9992 |
| 0.0023 | 3.0 | 18822 | 0.0022 | 0.9995 | 0.9994 |
## Test results
| Test Loss | Test Accuracy | Test F1 |
|:------------|:--------------|:----------|
| 0.0020 | 0.9994 | 0.9994 |
## Five Fold Cross Validation Mean Test Confusion Matrix

### Framework versions
- Transformers 4.38.2
- Pytorch 2.0.0+cu117
- Datasets 2.18.0
- Tokenizers 0.15.1
### Citations
[1]M. Landauer, F. Skopik, M. Frank, W. Hotwagner, M. Wurzenbergerand A. Rauber, “AIT Log Data Set V2.0”. Zenodo, Feb. 24, 2022. doi: 10.5281/zenodo.5789064. |