yonad2008 commited on
Commit
682b2bb
·
verified ·
1 Parent(s): 1c7d973

Delete README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +0 -58
README.md DELETED
@@ -1,58 +0,0 @@
1
- ---
2
- language: en
3
- tags:
4
- - jailbreak-detection
5
- - deberta-v3
6
- - text-classification
7
- model-index:
8
- - name: predict_llama2_7b
9
- results:
10
- - task:
11
- type: text-classification
12
- name: Jailbreak Detection
13
- metrics:
14
- - name: F1
15
- type: f1
16
- value: 0.9388
17
- - name: PR-AUC
18
- type: pr_auc
19
- value: 0.9507
20
- - name: ROC-AUC
21
- type: roc_auc
22
- value: 0.9745
23
- - name: Precision
24
- type: precision
25
- value: 0.9583
26
- - name: Recall
27
- type: recall
28
- value: 0.9200
29
- ---
30
- # Jailbreak Prediction Model: llama2:7b
31
-
32
- Fine-tuned DeBERTa-v3-base for detecting unsafe/jailbreak prompts in multi-turn conversations.
33
-
34
- ## Evaluation Results (best fold: 2)
35
-
36
- | Metric | Value |
37
- |----------------|--------|
38
- | F1 | 0.9388 |
39
- | PR-AUC | 0.9507 |
40
- | ROC-AUC | 0.9745 |
41
- | Precision | 0.9583 |
42
- | Recall | 0.9200 |
43
- | Best Threshold | 0.50 |
44
-
45
- ## Training Details
46
-
47
- - **Base model**: `microsoft/deberta-v3-base`
48
- - **Target model**: `llama2:7b`
49
- - **Datasets**: HarmBench
50
- - **K-Folds**: 5
51
- - **Epochs**: 5
52
- - **Learning Rate**: 2e-05
53
- - **Max Length**: 512
54
- - **Input format**: turns only
55
-
56
- ## Dataset Size (before turn expansion)
57
-
58
- Original rows (after cleaning and balancing): 750 (unsafe: 124, safe: 626)