hurtmongoose commited on
Commit
98c6413
·
verified ·
1 Parent(s): 226b1f4

Phi-2 LoRA fine-tuned for 3-class adversarial prompt detection

Browse files
Files changed (3) hide show
  1. README.md +80 -0
  2. eval_metrics.json +42 -0
  3. overall_metrics.csv +2 -0
README.md ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ license: mit
4
+ base_model: microsoft/phi-2
5
+ tags:
6
+ - base_model:adapter:microsoft/phi-2
7
+ - lora
8
+ - transformers
9
+ metrics:
10
+ - accuracy
11
+ model-index:
12
+ - name: phi2-lora-malicious-classifier
13
+ results: []
14
+ ---
15
+
16
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
17
+ should probably proofread and complete it, then remove this comment. -->
18
+
19
+ # phi2-lora-malicious-classifier
20
+
21
+ This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on an unknown dataset.
22
+ It achieves the following results on the evaluation set:
23
+ - Loss: 0.4627
24
+ - Accuracy: 0.8476
25
+ - Precision Weighted: 0.8428
26
+ - Recall Weighted: 0.8476
27
+ - F1 Weighted: 0.8440
28
+ - Mcc: 0.7515
29
+ - Balanced Accuracy: 0.7994
30
+ - Per Class: {'jailbreaking': {'TP': 259, 'FP': 97, 'FN': 130, 'TN': 1443, 'FNR': 0.3341902313624679, 'FPR': 0.06298701298701298, 'Specificity': 0.937012987012987}, 'prompt injection': {'TP': 434, 'FP': 97, 'FN': 136, 'TN': 1262, 'FNR': 0.23859649122807017, 'FPR': 0.07137601177336277, 'Specificity': 0.9286239882266373}, 'unharmful': {'TP': 942, 'FP': 100, 'FN': 28, 'TN': 859, 'FNR': 0.0288659793814433, 'FPR': 0.10427528675703858, 'Specificity': 0.8957247132429614}}
31
+
32
+ ## Model description
33
+
34
+ More information needed
35
+
36
+ ## Intended uses & limitations
37
+
38
+ More information needed
39
+
40
+ ## Training and evaluation data
41
+
42
+ More information needed
43
+
44
+ ## Training procedure
45
+
46
+ ### Training hyperparameters
47
+
48
+ The following hyperparameters were used during training:
49
+ - learning_rate: 2e-05
50
+ - train_batch_size: 8
51
+ - eval_batch_size: 8
52
+ - seed: 42
53
+ - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
54
+ - lr_scheduler_type: linear
55
+ - lr_scheduler_warmup_ratio: 0.1
56
+ - num_epochs: 10
57
+
58
+ ### Training results
59
+
60
+ | Training Loss | Epoch | Step | Validation Loss | Accuracy | Precision Weighted | Recall Weighted | F1 Weighted | Mcc | Balanced Accuracy | Per Class |
61
+ |:-------------:|:-----:|:-----:|:---------------:|:--------:|:------------------:|:---------------:|:-----------:|:------:|:-----------------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
62
+ | 0.7931 | 1.0 | 1107 | 0.7796 | 0.6910 | 0.6758 | 0.6910 | 0.6750 | 0.4860 | 0.6086 | {'jailbreaking': {'TP': 148, 'FP': 141, 'FN': 241, 'TN': 1399, 'FNR': 0.6195372750642674, 'FPR': 0.09155844155844156, 'Specificity': 0.9084415584415585}, 'prompt injection': {'TP': 309, 'FP': 145, 'FN': 261, 'TN': 1214, 'FNR': 0.45789473684210524, 'FPR': 0.10669610007358352, 'Specificity': 0.8933038999264165}, 'unharmful': {'TP': 876, 'FP': 310, 'FN': 94, 'TN': 649, 'FNR': 0.09690721649484536, 'FPR': 0.3232533889468196, 'Specificity': 0.6767466110531803}} |
63
+ | 0.5884 | 2.0 | 2214 | 0.5420 | 0.8015 | 0.7916 | 0.8015 | 0.7899 | 0.6742 | 0.7269 | {'jailbreaking': {'TP': 188, 'FP': 94, 'FN': 201, 'TN': 1446, 'FNR': 0.5167095115681234, 'FPR': 0.06103896103896104, 'Specificity': 0.938961038961039}, 'prompt injection': {'TP': 411, 'FP': 97, 'FN': 159, 'TN': 1262, 'FNR': 0.2789473684210526, 'FPR': 0.07137601177336277, 'Specificity': 0.9286239882266373}, 'unharmful': {'TP': 947, 'FP': 192, 'FN': 23, 'TN': 767, 'FNR': 0.023711340206185566, 'FPR': 0.20020855057351408, 'Specificity': 0.799791449426486}} |
64
+ | 0.462 | 3.0 | 3321 | 0.5065 | 0.8186 | 0.8101 | 0.8186 | 0.8101 | 0.7026 | 0.7531 | {'jailbreaking': {'TP': 208, 'FP': 90, 'FN': 181, 'TN': 1450, 'FNR': 0.4652956298200514, 'FPR': 0.05844155844155844, 'Specificity': 0.9415584415584416}, 'prompt injection': {'TP': 430, 'FP': 103, 'FN': 140, 'TN': 1256, 'FNR': 0.24561403508771928, 'FPR': 0.07579102281089035, 'Specificity': 0.9242089771891097}, 'unharmful': {'TP': 941, 'FP': 157, 'FN': 29, 'TN': 802, 'FNR': 0.029896907216494847, 'FPR': 0.16371220020855057, 'Specificity': 0.8362877997914494}} |
65
+ | 0.4606 | 4.0 | 4428 | 0.4729 | 0.8305 | 0.8248 | 0.8305 | 0.8267 | 0.7233 | 0.7784 | {'jailbreaking': {'TP': 242, 'FP': 105, 'FN': 147, 'TN': 1435, 'FNR': 0.37789203084832906, 'FPR': 0.06818181818181818, 'Specificity': 0.9318181818181818}, 'prompt injection': {'TP': 430, 'FP': 119, 'FN': 140, 'TN': 1240, 'FNR': 0.24561403508771928, 'FPR': 0.0875643855776306, 'Specificity': 0.9124356144223694}, 'unharmful': {'TP': 930, 'FP': 103, 'FN': 40, 'TN': 856, 'FNR': 0.041237113402061855, 'FPR': 0.10740354535974973, 'Specificity': 0.8925964546402503}} |
66
+ | 0.4217 | 5.0 | 5535 | 0.4688 | 0.8351 | 0.8295 | 0.8351 | 0.8311 | 0.7309 | 0.7831 | {'jailbreaking': {'TP': 245, 'FP': 104, 'FN': 144, 'TN': 1436, 'FNR': 0.37017994858611825, 'FPR': 0.06753246753246753, 'Specificity': 0.9324675324675324}, 'prompt injection': {'TP': 430, 'FP': 106, 'FN': 140, 'TN': 1253, 'FNR': 0.24561403508771928, 'FPR': 0.07799852832965416, 'Specificity': 0.9220014716703459}, 'unharmful': {'TP': 936, 'FP': 108, 'FN': 34, 'TN': 851, 'FNR': 0.03505154639175258, 'FPR': 0.11261730969760167, 'Specificity': 0.8873826903023984}} |
67
+ | 0.445 | 6.0 | 6642 | 0.4465 | 0.8434 | 0.8390 | 0.8434 | 0.8402 | 0.7449 | 0.7953 | {'jailbreaking': {'TP': 259, 'FP': 107, 'FN': 130, 'TN': 1433, 'FNR': 0.3341902313624679, 'FPR': 0.06948051948051948, 'Specificity': 0.9305194805194805}, 'prompt injection': {'TP': 428, 'FP': 98, 'FN': 142, 'TN': 1261, 'FNR': 0.24912280701754386, 'FPR': 0.07211184694628403, 'Specificity': 0.927888153053716}, 'unharmful': {'TP': 940, 'FP': 97, 'FN': 30, 'TN': 862, 'FNR': 0.030927835051546393, 'FPR': 0.10114702815432743, 'Specificity': 0.8988529718456726}} |
68
+ | 0.3633 | 7.0 | 7749 | 0.4604 | 0.8460 | 0.8409 | 0.8460 | 0.8422 | 0.7487 | 0.7968 | {'jailbreaking': {'TP': 253, 'FP': 90, 'FN': 136, 'TN': 1450, 'FNR': 0.3496143958868895, 'FPR': 0.05844155844155844, 'Specificity': 0.9415584415584416}, 'prompt injection': {'TP': 440, 'FP': 106, 'FN': 130, 'TN': 1253, 'FNR': 0.22807017543859648, 'FPR': 0.07799852832965416, 'Specificity': 0.9220014716703459}, 'unharmful': {'TP': 939, 'FP': 101, 'FN': 31, 'TN': 858, 'FNR': 0.031958762886597936, 'FPR': 0.10531803962460896, 'Specificity': 0.894681960375391}} |
69
+ | 0.3249 | 8.0 | 8856 | 0.4670 | 0.8450 | 0.8408 | 0.8450 | 0.8417 | 0.7475 | 0.7976 | {'jailbreaking': {'TP': 262, 'FP': 106, 'FN': 127, 'TN': 1434, 'FNR': 0.3264781491002571, 'FPR': 0.06883116883116883, 'Specificity': 0.9311688311688312}, 'prompt injection': {'TP': 427, 'FP': 93, 'FN': 143, 'TN': 1266, 'FNR': 0.25087719298245614, 'FPR': 0.0684326710816777, 'Specificity': 0.9315673289183223}, 'unharmful': {'TP': 941, 'FP': 100, 'FN': 29, 'TN': 859, 'FNR': 0.029896907216494847, 'FPR': 0.10427528675703858, 'Specificity': 0.8957247132429614}} |
70
+ | 0.3672 | 9.0 | 9963 | 0.4610 | 0.8471 | 0.8421 | 0.8471 | 0.8434 | 0.7505 | 0.7980 | {'jailbreaking': {'TP': 255, 'FP': 96, 'FN': 134, 'TN': 1444, 'FNR': 0.3444730077120823, 'FPR': 0.06233766233766234, 'Specificity': 0.9376623376623376}, 'prompt injection': {'TP': 438, 'FP': 99, 'FN': 132, 'TN': 1260, 'FNR': 0.23157894736842105, 'FPR': 0.0728476821192053, 'Specificity': 0.9271523178807947}, 'unharmful': {'TP': 941, 'FP': 100, 'FN': 29, 'TN': 859, 'FNR': 0.029896907216494847, 'FPR': 0.10427528675703858, 'Specificity': 0.8957247132429614}} |
71
+ | 0.4548 | 10.0 | 11070 | 0.4627 | 0.8476 | 0.8428 | 0.8476 | 0.8440 | 0.7515 | 0.7994 | {'jailbreaking': {'TP': 259, 'FP': 97, 'FN': 130, 'TN': 1443, 'FNR': 0.3341902313624679, 'FPR': 0.06298701298701298, 'Specificity': 0.937012987012987}, 'prompt injection': {'TP': 434, 'FP': 97, 'FN': 136, 'TN': 1262, 'FNR': 0.23859649122807017, 'FPR': 0.07137601177336277, 'Specificity': 0.9286239882266373}, 'unharmful': {'TP': 942, 'FP': 100, 'FN': 28, 'TN': 859, 'FNR': 0.0288659793814433, 'FPR': 0.10427528675703858, 'Specificity': 0.8957247132429614}} |
72
+
73
+
74
+ ### Framework versions
75
+
76
+ - PEFT 0.17.1
77
+ - Transformers 4.53.3
78
+ - Pytorch 2.6.0+cu124
79
+ - Datasets 4.3.0
80
+ - Tokenizers 0.21.4
eval_metrics.json ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "eval_loss": 0.462687611579895,
3
+ "eval_accuracy": 0.8475894245723172,
4
+ "eval_precision_weighted": 0.8428169632185296,
5
+ "eval_recall_weighted": 0.8475894245723172,
6
+ "eval_f1_weighted": 0.8440311242475405,
7
+ "eval_MCC": 0.751470722356998,
8
+ "eval_balanced_accuracy": 0.7994490993426729,
9
+ "eval_per_class": {
10
+ "jailbreaking": {
11
+ "TP": 259,
12
+ "FP": 97,
13
+ "FN": 130,
14
+ "TN": 1443,
15
+ "FNR": 0.3341902313624679,
16
+ "FPR": 0.06298701298701298,
17
+ "Specificity": 0.937012987012987
18
+ },
19
+ "prompt injection": {
20
+ "TP": 434,
21
+ "FP": 97,
22
+ "FN": 136,
23
+ "TN": 1262,
24
+ "FNR": 0.23859649122807017,
25
+ "FPR": 0.07137601177336277,
26
+ "Specificity": 0.9286239882266373
27
+ },
28
+ "unharmful": {
29
+ "TP": 942,
30
+ "FP": 100,
31
+ "FN": 28,
32
+ "TN": 859,
33
+ "FNR": 0.0288659793814433,
34
+ "FPR": 0.10427528675703858,
35
+ "Specificity": 0.8957247132429614
36
+ }
37
+ },
38
+ "eval_runtime": 68.6103,
39
+ "eval_samples_per_second": 28.115,
40
+ "eval_steps_per_second": 3.527,
41
+ "epoch": 10.0
42
+ }
overall_metrics.csv ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ eval_loss,eval_accuracy,eval_precision_weighted,eval_recall_weighted,eval_f1_weighted,eval_MCC,eval_balanced_accuracy,eval_runtime,eval_samples_per_second,eval_steps_per_second,epoch
2
+ 0.462687611579895,0.8475894245723172,0.8428169632185296,0.8475894245723172,0.8440311242475405,0.751470722356998,0.7994490993426729,68.6103,28.115,3.527,10.0