|
|
--- |
|
|
license: mit |
|
|
metrics: |
|
|
- accuracy |
|
|
- matthews_correlation |
|
|
pipeline_tag: text-classification |
|
|
datasets: |
|
|
- 19kmunz/iot-23-preprocessed-minimumcolumns |
|
|
widget: |
|
|
- text: "8081 tcp S0 2 80 0" |
|
|
example_title: "malicious, label_1" |
|
|
- text: "37215 tcp S0 2 80 0" |
|
|
example_title: "malicious, label_1" |
|
|
- text: "67 udp S0 11 3608 0" |
|
|
example_title: "Benign, label_0" |
|
|
- text: "0 icmp OTH 9 844 0" |
|
|
example_title: "Benign, label_0" |
|
|
--- |
|
|
## introduction |
|
|
This is an undergraduate course project in computer security. |
|
|
|
|
|
The task is to fine tune the large model to achieve malicious network flow data detection. |
|
|
|
|
|
## base model |
|
|
bert-base-uncased |
|
|
|
|
|
## dataset: |
|
|
19kmunz/iot-23-preprocessed-minimumcolumns |
|
|
|
|
|
## example prompt: |
|
|
```markdown |
|
|
8081 tcp S0 2 80 0 |
|
|
37215 tcp S0 2 80 0 |
|
|
52869 tcp S0 2 80 0 |
|
|
8080 tcp S0 2 80 0 |
|
|
80 tcp S0 2 80 0 |
|
|
``` |
|
|
The above are "malicious", which is "label_1". |
|
|
```markdown |
|
|
67 udp S0 11 3608 0 |
|
|
0 icmp OTH 9 844 0 |
|
|
136 icmp OTH 3 216 0 |
|
|
0 icmp OTH 8 648 0 |
|
|
134 icmp OTH 2 96 0 |
|
|
``` |
|
|
The above are "Benign", which is "label_0". |
|
|
|
|
|
## accuracy |
|
|
```markdown |
|
|
Training Loss Valid. Loss Valid. Accur. |
|
|
epoch |
|
|
1 0.288545 0.190351 0.929988 |
|
|
2 0.147658 0.154426 0.943510 |
|
|
3 0.108059 0.173112 0.943510 |
|
|
4 0.092468 0.161035 0.947416 |
|
|
``` |
|
|
|
|
|
## MCC score: 0.816 |
|
|
|
|
|
The "Total MCC" refers to the Matthews Correlation Coefficient (MCC), typically used to assess the quality of predictions in binary classification problems. |
|
|
|
|
|
The MCC value ranges from -1 to 1, where 1 signifies perfect predictions, 0 indicates predictions similar to random chance, and -1 denotes completely opposite predictions. |
|
|
|
|
|
A model with an MCC value of 0.816 can be considered quite good. This value being close to 1 implies that the model has a high predictive capability and can classify samples with considerable accuracy. A higher MCC value closer to 1 indicates stronger predictive ability in the model. |
|
|
|
|
|
In summary, an MCC value of 0.816 indicates that the model demonstrates a high level of accuracy and predictive capability in binary classification tasks. |
|
|
|