yonad2008 commited on
Commit
2fc3e40
·
verified ·
1 Parent(s): 50f0fbd

Upload XGBoost TF-IDF model artifacts

Browse files
Files changed (3) hide show
  1. README.md +60 -0
  2. best_threshold.txt +1 -0
  3. xgboost_tfidf_model.joblib +3 -0
README.md ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - xgboost
5
+ - jailbreak-detection
6
+ - text-classification
7
+ model-index:
8
+ - name: predict_xgb_phi4_14b
9
+ results:
10
+ - task:
11
+ type: text-classification
12
+ name: Jailbreak Detection
13
+ metrics:
14
+ - name: F1
15
+ type: f1
16
+ value: 0.2807
17
+ - name: PR-AUC
18
+ type: pr_auc
19
+ value: 0.2896
20
+ - name: ROC-AUC
21
+ type: roc_auc
22
+ value: 0.7231
23
+ - name: Precision
24
+ type: precision
25
+ value: 0.2500
26
+ - name: Recall
27
+ type: recall
28
+ value: 0.3200
29
+ ---
30
+ # XGBoost Jailbreak Prediction Model: phi4:14b
31
+
32
+ XGBoost + TF-IDF (+ optional TruncatedSVD) classifier for unsafe/jailbreak likelihood in multi-turn conversations.
33
+
34
+ ## Evaluation Results (best fold: 1)
35
+
36
+ | Metric | Value |
37
+ |----------------|--------|
38
+ | F1 | 0.2807 |
39
+ | PR-AUC | 0.2896 |
40
+ | ROC-AUC | 0.7231 |
41
+ | Precision | 0.2500 |
42
+ | Recall | 0.3200 |
43
+ | Best Threshold | 0.20 |
44
+
45
+ ## Training Details
46
+
47
+ - **Target model**: `phi4:14b`
48
+ - **Datasets**: harmful_behaviors
49
+ - **K-Folds**: 5
50
+ - **Input format**: single turn: category + strategy_name + one TURN line
51
+ - **TF-IDF ngram_range**: `(1, 1)`
52
+ - **TF-IDF max_features**: `120000`
53
+ - **TruncatedSVD**: enabled `True`, requested `n_components=1024`
54
+ - **XGBoost n_estimators**: `971`
55
+ - **XGBoost learning_rate**: `0.045325359791945935`
56
+ - **XGBoost max_depth**: `7`
57
+
58
+ ## Dataset Size (training samples)
59
+
60
+ Prepared turn-level samples: 1611 (unsafe: 119, safe: 1492)
best_threshold.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ 0.20
xgboost_tfidf_model.joblib ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d0843932e96785ae59f1116fc2c4c48174cbae5240e0d1686969980a1629f53f
3
+ size 24083944