yonad2008 commited on
Commit
a0e22b0
·
verified ·
1 Parent(s): 9d704a3

Upload GNN turn-level model artifacts

Browse files
Files changed (3) hide show
  1. README.md +58 -0
  2. gnn_homo_payload.pt +3 -0
  3. metadata.json +22 -0
README.md ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - gnn
5
+ - jailbreak-detection
6
+ - text-classification
7
+ model-index:
8
+ - name: predict_gnn_phi4_14b
9
+ results:
10
+ - task:
11
+ type: text-classification
12
+ name: Jailbreak Detection
13
+ metrics:
14
+ - name: F1
15
+ type: f1
16
+ value: 0.9274
17
+ - name: PR-AUC
18
+ type: pr_auc
19
+ value: 0.9636
20
+ - name: ROC-AUC
21
+ type: roc_auc
22
+ value: 0.9700
23
+ - name: Precision
24
+ type: precision
25
+ value: 0.9345
26
+ - name: Recall
27
+ type: recall
28
+ value: 0.9257
29
+ ---
30
+ # GNN Jailbreak Prediction Model (phi4:14b)
31
+
32
+ Homogeneous GNN classifier for unsafe/jailbreak likelihood in multi-turn conversations.
33
+
34
+ ## Evaluation Results
35
+
36
+ | Metric | Value |
37
+ |----------------|--------|
38
+ | F1 | 0.9274 |
39
+ | PR-AUC | 0.9636 |
40
+ | ROC-AUC | 0.9700 |
41
+ | Precision | 0.9345 |
42
+ | Recall | 0.9257 |
43
+ | Best Threshold | 0.500 |
44
+
45
+ ## Training Details
46
+
47
+ - **Target model**: `phi4:14b`
48
+ - **Datasets**: harmbench
49
+ - **Split column**: `goal`
50
+ - **Seed**: `42`
51
+ - **Sentence model**: `sentence-transformers/all-MiniLM-L6-v2`
52
+ - **Hidden channels**: `128`
53
+ - **Num layers**: `2`
54
+ - **Dropout**: `0.3`
55
+
56
+ ## Dataset Size (training samples)
57
+
58
+ Prepared turn-level samples: 395
gnn_homo_payload.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1a3de3797a8ff4c91fa839a1c2bdebcad1c7255f1b6ea21a8e729b7dbc5b4f66
3
+ size 974405
metadata.json ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "csv": "/home/digayona/multi_turn_jailbreak_RL/GNN/turns_table_llama3_8b_harmbench.csv",
3
+ "target_model": "phi4:14b",
4
+ "threshold": 0.5,
5
+ "sentence_model_name": "sentence-transformers/all-MiniLM-L6-v2",
6
+ "n_rows": 395,
7
+ "n_models": 1,
8
+ "split_col": "goal",
9
+ "seed": 42,
10
+ "model_kwargs": {
11
+ "hidden_channels": 128,
12
+ "num_layers": 2,
13
+ "dropout": 0.3
14
+ },
15
+ "test_metrics": {
16
+ "roc_auc": 0.9700266193433895,
17
+ "pr_auc": 0.9635752681366716,
18
+ "f1": 0.9274120884668552,
19
+ "precision": 0.9345029239766081,
20
+ "recall": 0.9257142857142856
21
+ }
22
+ }