|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: distilbert-base-uncased |
|
|
tags: |
|
|
- generated_from_trainer |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
model-index: |
|
|
- name: distilbert-base-uncased-logline-v3 |
|
|
results: [] |
|
|
--- |
|
|
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
|
|
# distilbert-base-uncased-logline-v3 |
|
|
|
|
|
This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on the AIT Log Data Set V2.0 dataset<sup>1</sup>, https://zenodo.org/records/5789064. |
|
|
It achieves the following results on the evaluation set: |
|
|
- Loss: 0.0022 |
|
|
- Accuracy: 0.9995 |
|
|
- F1: 0.9994 |
|
|
|
|
|
## Model description |
|
|
|
|
|
This model is meant for text classification of log files for network intrusion detection. The python package that runs this model can be found here -> https://github.com/Isaacwilliam4/INSyT. |
|
|
As mentioned on their site, this model was trained on the following logs: Apache access and error logs, authentication logs, DNS logs, VPN logs, audit logs, Suricata logs, network traffic packet captures, horde logs, exim logs, syslog, and system monitoring logs. |
|
|
|
|
|
## Labels |
|
|
| Label | Label Name | |
|
|
|-------|---------------------------------------------------------------------| |
|
|
| 0 | attacker:dnsteal:dnsteal-dropped | |
|
|
| 1 | attacker:dnsteal:dnsteal-received | |
|
|
| 2 | attacker:dnsteal:exfiltration-service | |
|
|
| 3 | attacker_change_user:escalate | |
|
|
| 4 | attacker_change_user:escalate:escalated_command:escalated_sudo_command | |
|
|
| 5 | attacker_http:dirb:foothold | |
|
|
| 6 | attacker_http:foothold:service_scan | |
|
|
| 7 | attacker_http:foothold:webshell_cmd | |
|
|
| 8 | attacker_http:foothold:webshell_upload | |
|
|
| 9 | attacker_http:foothold:wpscan | |
|
|
| 10 | attacker_vpn:escalate | |
|
|
| 11 | attacker_vpn:foothold | |
|
|
| 12 | benign | |
|
|
| 13 | crack_passwords:escalate | |
|
|
| 14 | dirb:foothold | |
|
|
| 15 | dns_scan:foothold | |
|
|
| 16 | escalate:escalated_command:escalated_sudo_command | |
|
|
| 17 | escalate:escalated_command:escalated_sudo_command:escalated_sudo_session | |
|
|
| 18 | escalate:webshell_cmd | |
|
|
| 19 | foothold:network_scan | |
|
|
| 20 | foothold:service_scan | |
|
|
| 21 | foothold:traceroute | |
|
|
| 22 | foothold:wpscan | |
|
|
|
|
|
|
|
|
### Training hyperparameters |
|
|
|
|
|
The following hyperparameters were used during training: |
|
|
- learning_rate: 2e-05 |
|
|
- train_batch_size: 32 |
|
|
- eval_batch_size: 32 |
|
|
- seed: 42 |
|
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
|
- lr_scheduler_type: linear |
|
|
- num_epochs: 3 |
|
|
|
|
|
### Training results |
|
|
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | |
|
|
|:-------------:|:-----:|:-----:|:---------------:|:--------:|:------:| |
|
|
| 0.0435 | 1.0 | 6274 | 0.0120 | 0.9965 | 0.9965 | |
|
|
| 0.0059 | 2.0 | 12548 | 0.0032 | 0.9993 | 0.9992 | |
|
|
| 0.0023 | 3.0 | 18822 | 0.0022 | 0.9995 | 0.9994 | |
|
|
|
|
|
## Test results |
|
|
|
|
|
| Test Loss | Test Accuracy | Test F1 | |
|
|
|:------------|:--------------|:----------| |
|
|
| 0.0020 | 0.9994 | 0.9994 | |
|
|
|
|
|
## Five Fold Cross Validation Mean Test Confusion Matrix |
|
|
|
|
|
 |
|
|
|
|
|
### Framework versions |
|
|
|
|
|
- Transformers 4.38.2 |
|
|
- Pytorch 2.0.0+cu117 |
|
|
- Datasets 2.18.0 |
|
|
- Tokenizers 0.15.1 |
|
|
|
|
|
### Citations |
|
|
|
|
|
[1]M. Landauer, F. Skopik, M. Frank, W. Hotwagner, M. Wurzenbergerand A. Rauber, “AIT Log Data Set V2.0”. Zenodo, Feb. 24, 2022. doi: 10.5281/zenodo.5789064. |