File size: 9,243 Bytes
98c6413
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
---
library_name: peft
license: mit
base_model: microsoft/phi-2
tags:
- base_model:adapter:microsoft/phi-2
- lora
- transformers
metrics:
- accuracy
model-index:
- name: phi2-lora-malicious-classifier
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# phi2-lora-malicious-classifier

This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.4627
- Accuracy: 0.8476
- Precision Weighted: 0.8428
- Recall Weighted: 0.8476
- F1 Weighted: 0.8440
- Mcc: 0.7515
- Balanced Accuracy: 0.7994
- Per Class: {'jailbreaking': {'TP': 259, 'FP': 97, 'FN': 130, 'TN': 1443, 'FNR': 0.3341902313624679, 'FPR': 0.06298701298701298, 'Specificity': 0.937012987012987}, 'prompt injection': {'TP': 434, 'FP': 97, 'FN': 136, 'TN': 1262, 'FNR': 0.23859649122807017, 'FPR': 0.07137601177336277, 'Specificity': 0.9286239882266373}, 'unharmful': {'TP': 942, 'FP': 100, 'FN': 28, 'TN': 859, 'FNR': 0.0288659793814433, 'FPR': 0.10427528675703858, 'Specificity': 0.8957247132429614}}

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 10

### Training results

| Training Loss | Epoch | Step  | Validation Loss | Accuracy | Precision Weighted | Recall Weighted | F1 Weighted | Mcc    | Balanced Accuracy | Per Class                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
|:-------------:|:-----:|:-----:|:---------------:|:--------:|:------------------:|:---------------:|:-----------:|:------:|:-----------------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
| 0.7931        | 1.0   | 1107  | 0.7796          | 0.6910   | 0.6758             | 0.6910          | 0.6750      | 0.4860 | 0.6086            | {'jailbreaking': {'TP': 148, 'FP': 141, 'FN': 241, 'TN': 1399, 'FNR': 0.6195372750642674, 'FPR': 0.09155844155844156, 'Specificity': 0.9084415584415585}, 'prompt injection': {'TP': 309, 'FP': 145, 'FN': 261, 'TN': 1214, 'FNR': 0.45789473684210524, 'FPR': 0.10669610007358352, 'Specificity': 0.8933038999264165}, 'unharmful': {'TP': 876, 'FP': 310, 'FN': 94, 'TN': 649, 'FNR': 0.09690721649484536, 'FPR': 0.3232533889468196, 'Specificity': 0.6767466110531803}}   |
| 0.5884        | 2.0   | 2214  | 0.5420          | 0.8015   | 0.7916             | 0.8015          | 0.7899      | 0.6742 | 0.7269            | {'jailbreaking': {'TP': 188, 'FP': 94, 'FN': 201, 'TN': 1446, 'FNR': 0.5167095115681234, 'FPR': 0.06103896103896104, 'Specificity': 0.938961038961039}, 'prompt injection': {'TP': 411, 'FP': 97, 'FN': 159, 'TN': 1262, 'FNR': 0.2789473684210526, 'FPR': 0.07137601177336277, 'Specificity': 0.9286239882266373}, 'unharmful': {'TP': 947, 'FP': 192, 'FN': 23, 'TN': 767, 'FNR': 0.023711340206185566, 'FPR': 0.20020855057351408, 'Specificity': 0.799791449426486}}      |
| 0.462         | 3.0   | 3321  | 0.5065          | 0.8186   | 0.8101             | 0.8186          | 0.8101      | 0.7026 | 0.7531            | {'jailbreaking': {'TP': 208, 'FP': 90, 'FN': 181, 'TN': 1450, 'FNR': 0.4652956298200514, 'FPR': 0.05844155844155844, 'Specificity': 0.9415584415584416}, 'prompt injection': {'TP': 430, 'FP': 103, 'FN': 140, 'TN': 1256, 'FNR': 0.24561403508771928, 'FPR': 0.07579102281089035, 'Specificity': 0.9242089771891097}, 'unharmful': {'TP': 941, 'FP': 157, 'FN': 29, 'TN': 802, 'FNR': 0.029896907216494847, 'FPR': 0.16371220020855057, 'Specificity': 0.8362877997914494}}  |
| 0.4606        | 4.0   | 4428  | 0.4729          | 0.8305   | 0.8248             | 0.8305          | 0.8267      | 0.7233 | 0.7784            | {'jailbreaking': {'TP': 242, 'FP': 105, 'FN': 147, 'TN': 1435, 'FNR': 0.37789203084832906, 'FPR': 0.06818181818181818, 'Specificity': 0.9318181818181818}, 'prompt injection': {'TP': 430, 'FP': 119, 'FN': 140, 'TN': 1240, 'FNR': 0.24561403508771928, 'FPR': 0.0875643855776306, 'Specificity': 0.9124356144223694}, 'unharmful': {'TP': 930, 'FP': 103, 'FN': 40, 'TN': 856, 'FNR': 0.041237113402061855, 'FPR': 0.10740354535974973, 'Specificity': 0.8925964546402503}} |
| 0.4217        | 5.0   | 5535  | 0.4688          | 0.8351   | 0.8295             | 0.8351          | 0.8311      | 0.7309 | 0.7831            | {'jailbreaking': {'TP': 245, 'FP': 104, 'FN': 144, 'TN': 1436, 'FNR': 0.37017994858611825, 'FPR': 0.06753246753246753, 'Specificity': 0.9324675324675324}, 'prompt injection': {'TP': 430, 'FP': 106, 'FN': 140, 'TN': 1253, 'FNR': 0.24561403508771928, 'FPR': 0.07799852832965416, 'Specificity': 0.9220014716703459}, 'unharmful': {'TP': 936, 'FP': 108, 'FN': 34, 'TN': 851, 'FNR': 0.03505154639175258, 'FPR': 0.11261730969760167, 'Specificity': 0.8873826903023984}} |
| 0.445         | 6.0   | 6642  | 0.4465          | 0.8434   | 0.8390             | 0.8434          | 0.8402      | 0.7449 | 0.7953            | {'jailbreaking': {'TP': 259, 'FP': 107, 'FN': 130, 'TN': 1433, 'FNR': 0.3341902313624679, 'FPR': 0.06948051948051948, 'Specificity': 0.9305194805194805}, 'prompt injection': {'TP': 428, 'FP': 98, 'FN': 142, 'TN': 1261, 'FNR': 0.24912280701754386, 'FPR': 0.07211184694628403, 'Specificity': 0.927888153053716}, 'unharmful': {'TP': 940, 'FP': 97, 'FN': 30, 'TN': 862, 'FNR': 0.030927835051546393, 'FPR': 0.10114702815432743, 'Specificity': 0.8988529718456726}}    |
| 0.3633        | 7.0   | 7749  | 0.4604          | 0.8460   | 0.8409             | 0.8460          | 0.8422      | 0.7487 | 0.7968            | {'jailbreaking': {'TP': 253, 'FP': 90, 'FN': 136, 'TN': 1450, 'FNR': 0.3496143958868895, 'FPR': 0.05844155844155844, 'Specificity': 0.9415584415584416}, 'prompt injection': {'TP': 440, 'FP': 106, 'FN': 130, 'TN': 1253, 'FNR': 0.22807017543859648, 'FPR': 0.07799852832965416, 'Specificity': 0.9220014716703459}, 'unharmful': {'TP': 939, 'FP': 101, 'FN': 31, 'TN': 858, 'FNR': 0.031958762886597936, 'FPR': 0.10531803962460896, 'Specificity': 0.894681960375391}}   |
| 0.3249        | 8.0   | 8856  | 0.4670          | 0.8450   | 0.8408             | 0.8450          | 0.8417      | 0.7475 | 0.7976            | {'jailbreaking': {'TP': 262, 'FP': 106, 'FN': 127, 'TN': 1434, 'FNR': 0.3264781491002571, 'FPR': 0.06883116883116883, 'Specificity': 0.9311688311688312}, 'prompt injection': {'TP': 427, 'FP': 93, 'FN': 143, 'TN': 1266, 'FNR': 0.25087719298245614, 'FPR': 0.0684326710816777, 'Specificity': 0.9315673289183223}, 'unharmful': {'TP': 941, 'FP': 100, 'FN': 29, 'TN': 859, 'FNR': 0.029896907216494847, 'FPR': 0.10427528675703858, 'Specificity': 0.8957247132429614}}   |
| 0.3672        | 9.0   | 9963  | 0.4610          | 0.8471   | 0.8421             | 0.8471          | 0.8434      | 0.7505 | 0.7980            | {'jailbreaking': {'TP': 255, 'FP': 96, 'FN': 134, 'TN': 1444, 'FNR': 0.3444730077120823, 'FPR': 0.06233766233766234, 'Specificity': 0.9376623376623376}, 'prompt injection': {'TP': 438, 'FP': 99, 'FN': 132, 'TN': 1260, 'FNR': 0.23157894736842105, 'FPR': 0.0728476821192053, 'Specificity': 0.9271523178807947}, 'unharmful': {'TP': 941, 'FP': 100, 'FN': 29, 'TN': 859, 'FNR': 0.029896907216494847, 'FPR': 0.10427528675703858, 'Specificity': 0.8957247132429614}}    |
| 0.4548        | 10.0  | 11070 | 0.4627          | 0.8476   | 0.8428             | 0.8476          | 0.8440      | 0.7515 | 0.7994            | {'jailbreaking': {'TP': 259, 'FP': 97, 'FN': 130, 'TN': 1443, 'FNR': 0.3341902313624679, 'FPR': 0.06298701298701298, 'Specificity': 0.937012987012987}, 'prompt injection': {'TP': 434, 'FP': 97, 'FN': 136, 'TN': 1262, 'FNR': 0.23859649122807017, 'FPR': 0.07137601177336277, 'Specificity': 0.9286239882266373}, 'unharmful': {'TP': 942, 'FP': 100, 'FN': 28, 'TN': 859, 'FNR': 0.0288659793814433, 'FPR': 0.10427528675703858, 'Specificity': 0.8957247132429614}}      |


### Framework versions

- PEFT 0.17.1
- Transformers 4.53.3
- Pytorch 2.6.0+cu124
- Datasets 4.3.0
- Tokenizers 0.21.4