Phi-2 LoRA fine-tuned for 3-class adversarial prompt detection
Browse files- README.md +80 -0
- eval_metrics.json +42 -0
- overall_metrics.csv +2 -0
README.md
ADDED
|
@@ -0,0 +1,80 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
library_name: peft
|
| 3 |
+
license: mit
|
| 4 |
+
base_model: microsoft/phi-2
|
| 5 |
+
tags:
|
| 6 |
+
- base_model:adapter:microsoft/phi-2
|
| 7 |
+
- lora
|
| 8 |
+
- transformers
|
| 9 |
+
metrics:
|
| 10 |
+
- accuracy
|
| 11 |
+
model-index:
|
| 12 |
+
- name: phi2-lora-malicious-classifier
|
| 13 |
+
results: []
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
| 17 |
+
should probably proofread and complete it, then remove this comment. -->
|
| 18 |
+
|
| 19 |
+
# phi2-lora-malicious-classifier
|
| 20 |
+
|
| 21 |
+
This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on an unknown dataset.
|
| 22 |
+
It achieves the following results on the evaluation set:
|
| 23 |
+
- Loss: 0.4627
|
| 24 |
+
- Accuracy: 0.8476
|
| 25 |
+
- Precision Weighted: 0.8428
|
| 26 |
+
- Recall Weighted: 0.8476
|
| 27 |
+
- F1 Weighted: 0.8440
|
| 28 |
+
- Mcc: 0.7515
|
| 29 |
+
- Balanced Accuracy: 0.7994
|
| 30 |
+
- Per Class: {'jailbreaking': {'TP': 259, 'FP': 97, 'FN': 130, 'TN': 1443, 'FNR': 0.3341902313624679, 'FPR': 0.06298701298701298, 'Specificity': 0.937012987012987}, 'prompt injection': {'TP': 434, 'FP': 97, 'FN': 136, 'TN': 1262, 'FNR': 0.23859649122807017, 'FPR': 0.07137601177336277, 'Specificity': 0.9286239882266373}, 'unharmful': {'TP': 942, 'FP': 100, 'FN': 28, 'TN': 859, 'FNR': 0.0288659793814433, 'FPR': 0.10427528675703858, 'Specificity': 0.8957247132429614}}
|
| 31 |
+
|
| 32 |
+
## Model description
|
| 33 |
+
|
| 34 |
+
More information needed
|
| 35 |
+
|
| 36 |
+
## Intended uses & limitations
|
| 37 |
+
|
| 38 |
+
More information needed
|
| 39 |
+
|
| 40 |
+
## Training and evaluation data
|
| 41 |
+
|
| 42 |
+
More information needed
|
| 43 |
+
|
| 44 |
+
## Training procedure
|
| 45 |
+
|
| 46 |
+
### Training hyperparameters
|
| 47 |
+
|
| 48 |
+
The following hyperparameters were used during training:
|
| 49 |
+
- learning_rate: 2e-05
|
| 50 |
+
- train_batch_size: 8
|
| 51 |
+
- eval_batch_size: 8
|
| 52 |
+
- seed: 42
|
| 53 |
+
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
| 54 |
+
- lr_scheduler_type: linear
|
| 55 |
+
- lr_scheduler_warmup_ratio: 0.1
|
| 56 |
+
- num_epochs: 10
|
| 57 |
+
|
| 58 |
+
### Training results
|
| 59 |
+
|
| 60 |
+
| Training Loss | Epoch | Step | Validation Loss | Accuracy | Precision Weighted | Recall Weighted | F1 Weighted | Mcc | Balanced Accuracy | Per Class |
|
| 61 |
+
|:-------------:|:-----:|:-----:|:---------------:|:--------:|:------------------:|:---------------:|:-----------:|:------:|:-----------------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
|
| 62 |
+
| 0.7931 | 1.0 | 1107 | 0.7796 | 0.6910 | 0.6758 | 0.6910 | 0.6750 | 0.4860 | 0.6086 | {'jailbreaking': {'TP': 148, 'FP': 141, 'FN': 241, 'TN': 1399, 'FNR': 0.6195372750642674, 'FPR': 0.09155844155844156, 'Specificity': 0.9084415584415585}, 'prompt injection': {'TP': 309, 'FP': 145, 'FN': 261, 'TN': 1214, 'FNR': 0.45789473684210524, 'FPR': 0.10669610007358352, 'Specificity': 0.8933038999264165}, 'unharmful': {'TP': 876, 'FP': 310, 'FN': 94, 'TN': 649, 'FNR': 0.09690721649484536, 'FPR': 0.3232533889468196, 'Specificity': 0.6767466110531803}} |
|
| 63 |
+
| 0.5884 | 2.0 | 2214 | 0.5420 | 0.8015 | 0.7916 | 0.8015 | 0.7899 | 0.6742 | 0.7269 | {'jailbreaking': {'TP': 188, 'FP': 94, 'FN': 201, 'TN': 1446, 'FNR': 0.5167095115681234, 'FPR': 0.06103896103896104, 'Specificity': 0.938961038961039}, 'prompt injection': {'TP': 411, 'FP': 97, 'FN': 159, 'TN': 1262, 'FNR': 0.2789473684210526, 'FPR': 0.07137601177336277, 'Specificity': 0.9286239882266373}, 'unharmful': {'TP': 947, 'FP': 192, 'FN': 23, 'TN': 767, 'FNR': 0.023711340206185566, 'FPR': 0.20020855057351408, 'Specificity': 0.799791449426486}} |
|
| 64 |
+
| 0.462 | 3.0 | 3321 | 0.5065 | 0.8186 | 0.8101 | 0.8186 | 0.8101 | 0.7026 | 0.7531 | {'jailbreaking': {'TP': 208, 'FP': 90, 'FN': 181, 'TN': 1450, 'FNR': 0.4652956298200514, 'FPR': 0.05844155844155844, 'Specificity': 0.9415584415584416}, 'prompt injection': {'TP': 430, 'FP': 103, 'FN': 140, 'TN': 1256, 'FNR': 0.24561403508771928, 'FPR': 0.07579102281089035, 'Specificity': 0.9242089771891097}, 'unharmful': {'TP': 941, 'FP': 157, 'FN': 29, 'TN': 802, 'FNR': 0.029896907216494847, 'FPR': 0.16371220020855057, 'Specificity': 0.8362877997914494}} |
|
| 65 |
+
| 0.4606 | 4.0 | 4428 | 0.4729 | 0.8305 | 0.8248 | 0.8305 | 0.8267 | 0.7233 | 0.7784 | {'jailbreaking': {'TP': 242, 'FP': 105, 'FN': 147, 'TN': 1435, 'FNR': 0.37789203084832906, 'FPR': 0.06818181818181818, 'Specificity': 0.9318181818181818}, 'prompt injection': {'TP': 430, 'FP': 119, 'FN': 140, 'TN': 1240, 'FNR': 0.24561403508771928, 'FPR': 0.0875643855776306, 'Specificity': 0.9124356144223694}, 'unharmful': {'TP': 930, 'FP': 103, 'FN': 40, 'TN': 856, 'FNR': 0.041237113402061855, 'FPR': 0.10740354535974973, 'Specificity': 0.8925964546402503}} |
|
| 66 |
+
| 0.4217 | 5.0 | 5535 | 0.4688 | 0.8351 | 0.8295 | 0.8351 | 0.8311 | 0.7309 | 0.7831 | {'jailbreaking': {'TP': 245, 'FP': 104, 'FN': 144, 'TN': 1436, 'FNR': 0.37017994858611825, 'FPR': 0.06753246753246753, 'Specificity': 0.9324675324675324}, 'prompt injection': {'TP': 430, 'FP': 106, 'FN': 140, 'TN': 1253, 'FNR': 0.24561403508771928, 'FPR': 0.07799852832965416, 'Specificity': 0.9220014716703459}, 'unharmful': {'TP': 936, 'FP': 108, 'FN': 34, 'TN': 851, 'FNR': 0.03505154639175258, 'FPR': 0.11261730969760167, 'Specificity': 0.8873826903023984}} |
|
| 67 |
+
| 0.445 | 6.0 | 6642 | 0.4465 | 0.8434 | 0.8390 | 0.8434 | 0.8402 | 0.7449 | 0.7953 | {'jailbreaking': {'TP': 259, 'FP': 107, 'FN': 130, 'TN': 1433, 'FNR': 0.3341902313624679, 'FPR': 0.06948051948051948, 'Specificity': 0.9305194805194805}, 'prompt injection': {'TP': 428, 'FP': 98, 'FN': 142, 'TN': 1261, 'FNR': 0.24912280701754386, 'FPR': 0.07211184694628403, 'Specificity': 0.927888153053716}, 'unharmful': {'TP': 940, 'FP': 97, 'FN': 30, 'TN': 862, 'FNR': 0.030927835051546393, 'FPR': 0.10114702815432743, 'Specificity': 0.8988529718456726}} |
|
| 68 |
+
| 0.3633 | 7.0 | 7749 | 0.4604 | 0.8460 | 0.8409 | 0.8460 | 0.8422 | 0.7487 | 0.7968 | {'jailbreaking': {'TP': 253, 'FP': 90, 'FN': 136, 'TN': 1450, 'FNR': 0.3496143958868895, 'FPR': 0.05844155844155844, 'Specificity': 0.9415584415584416}, 'prompt injection': {'TP': 440, 'FP': 106, 'FN': 130, 'TN': 1253, 'FNR': 0.22807017543859648, 'FPR': 0.07799852832965416, 'Specificity': 0.9220014716703459}, 'unharmful': {'TP': 939, 'FP': 101, 'FN': 31, 'TN': 858, 'FNR': 0.031958762886597936, 'FPR': 0.10531803962460896, 'Specificity': 0.894681960375391}} |
|
| 69 |
+
| 0.3249 | 8.0 | 8856 | 0.4670 | 0.8450 | 0.8408 | 0.8450 | 0.8417 | 0.7475 | 0.7976 | {'jailbreaking': {'TP': 262, 'FP': 106, 'FN': 127, 'TN': 1434, 'FNR': 0.3264781491002571, 'FPR': 0.06883116883116883, 'Specificity': 0.9311688311688312}, 'prompt injection': {'TP': 427, 'FP': 93, 'FN': 143, 'TN': 1266, 'FNR': 0.25087719298245614, 'FPR': 0.0684326710816777, 'Specificity': 0.9315673289183223}, 'unharmful': {'TP': 941, 'FP': 100, 'FN': 29, 'TN': 859, 'FNR': 0.029896907216494847, 'FPR': 0.10427528675703858, 'Specificity': 0.8957247132429614}} |
|
| 70 |
+
| 0.3672 | 9.0 | 9963 | 0.4610 | 0.8471 | 0.8421 | 0.8471 | 0.8434 | 0.7505 | 0.7980 | {'jailbreaking': {'TP': 255, 'FP': 96, 'FN': 134, 'TN': 1444, 'FNR': 0.3444730077120823, 'FPR': 0.06233766233766234, 'Specificity': 0.9376623376623376}, 'prompt injection': {'TP': 438, 'FP': 99, 'FN': 132, 'TN': 1260, 'FNR': 0.23157894736842105, 'FPR': 0.0728476821192053, 'Specificity': 0.9271523178807947}, 'unharmful': {'TP': 941, 'FP': 100, 'FN': 29, 'TN': 859, 'FNR': 0.029896907216494847, 'FPR': 0.10427528675703858, 'Specificity': 0.8957247132429614}} |
|
| 71 |
+
| 0.4548 | 10.0 | 11070 | 0.4627 | 0.8476 | 0.8428 | 0.8476 | 0.8440 | 0.7515 | 0.7994 | {'jailbreaking': {'TP': 259, 'FP': 97, 'FN': 130, 'TN': 1443, 'FNR': 0.3341902313624679, 'FPR': 0.06298701298701298, 'Specificity': 0.937012987012987}, 'prompt injection': {'TP': 434, 'FP': 97, 'FN': 136, 'TN': 1262, 'FNR': 0.23859649122807017, 'FPR': 0.07137601177336277, 'Specificity': 0.9286239882266373}, 'unharmful': {'TP': 942, 'FP': 100, 'FN': 28, 'TN': 859, 'FNR': 0.0288659793814433, 'FPR': 0.10427528675703858, 'Specificity': 0.8957247132429614}} |
|
| 72 |
+
|
| 73 |
+
|
| 74 |
+
### Framework versions
|
| 75 |
+
|
| 76 |
+
- PEFT 0.17.1
|
| 77 |
+
- Transformers 4.53.3
|
| 78 |
+
- Pytorch 2.6.0+cu124
|
| 79 |
+
- Datasets 4.3.0
|
| 80 |
+
- Tokenizers 0.21.4
|
eval_metrics.json
ADDED
|
@@ -0,0 +1,42 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"eval_loss": 0.462687611579895,
|
| 3 |
+
"eval_accuracy": 0.8475894245723172,
|
| 4 |
+
"eval_precision_weighted": 0.8428169632185296,
|
| 5 |
+
"eval_recall_weighted": 0.8475894245723172,
|
| 6 |
+
"eval_f1_weighted": 0.8440311242475405,
|
| 7 |
+
"eval_MCC": 0.751470722356998,
|
| 8 |
+
"eval_balanced_accuracy": 0.7994490993426729,
|
| 9 |
+
"eval_per_class": {
|
| 10 |
+
"jailbreaking": {
|
| 11 |
+
"TP": 259,
|
| 12 |
+
"FP": 97,
|
| 13 |
+
"FN": 130,
|
| 14 |
+
"TN": 1443,
|
| 15 |
+
"FNR": 0.3341902313624679,
|
| 16 |
+
"FPR": 0.06298701298701298,
|
| 17 |
+
"Specificity": 0.937012987012987
|
| 18 |
+
},
|
| 19 |
+
"prompt injection": {
|
| 20 |
+
"TP": 434,
|
| 21 |
+
"FP": 97,
|
| 22 |
+
"FN": 136,
|
| 23 |
+
"TN": 1262,
|
| 24 |
+
"FNR": 0.23859649122807017,
|
| 25 |
+
"FPR": 0.07137601177336277,
|
| 26 |
+
"Specificity": 0.9286239882266373
|
| 27 |
+
},
|
| 28 |
+
"unharmful": {
|
| 29 |
+
"TP": 942,
|
| 30 |
+
"FP": 100,
|
| 31 |
+
"FN": 28,
|
| 32 |
+
"TN": 859,
|
| 33 |
+
"FNR": 0.0288659793814433,
|
| 34 |
+
"FPR": 0.10427528675703858,
|
| 35 |
+
"Specificity": 0.8957247132429614
|
| 36 |
+
}
|
| 37 |
+
},
|
| 38 |
+
"eval_runtime": 68.6103,
|
| 39 |
+
"eval_samples_per_second": 28.115,
|
| 40 |
+
"eval_steps_per_second": 3.527,
|
| 41 |
+
"epoch": 10.0
|
| 42 |
+
}
|
overall_metrics.csv
ADDED
|
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
|
|
|
| 1 |
+
eval_loss,eval_accuracy,eval_precision_weighted,eval_recall_weighted,eval_f1_weighted,eval_MCC,eval_balanced_accuracy,eval_runtime,eval_samples_per_second,eval_steps_per_second,epoch
|
| 2 |
+
0.462687611579895,0.8475894245723172,0.8428169632185296,0.8475894245723172,0.8440311242475405,0.751470722356998,0.7994490993426729,68.6103,28.115,3.527,10.0
|