Upload XGBoost TF-IDF model artifacts

Browse files

Files changed (3) hide show

README.md +60 -0
best_threshold.txt +1 -0
xgboost_tfidf_model.joblib +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,60 @@

+---
+language: en
+tags:
+  - xgboost
+  - jailbreak-detection
+  - text-classification
+model-index:
+  - name: predict_xgb_phi4_14b
+    results:
+      - task:
+          type: text-classification
+          name: Jailbreak Detection
+        metrics:
+          - name: F1
+            type: f1
+            value: 0.2807
+          - name: PR-AUC
+            type: pr_auc
+            value: 0.2896
+          - name: ROC-AUC
+            type: roc_auc
+            value: 0.7231
+          - name: Precision
+            type: precision
+            value: 0.2500
+          - name: Recall
+            type: recall
+            value: 0.3200
+---
+# XGBoost Jailbreak Prediction Model: phi4:14b
+XGBoost + TF-IDF (+ optional TruncatedSVD) classifier for unsafe/jailbreak likelihood in multi-turn conversations.
+## Evaluation Results (best fold: 1)
+| Metric         | Value  |
+|----------------|--------|
+| F1             | 0.2807 |
+| PR-AUC         | 0.2896 |
+| ROC-AUC        | 0.7231 |
+| Precision      | 0.2500 |
+| Recall         | 0.3200 |
+| Best Threshold | 0.20 |
+## Training Details
+- **Target model**: `phi4:14b`
+- **Datasets**: harmful_behaviors
+- **K-Folds**: 5
+- **Input format**: single turn: category + strategy_name + one TURN line
+- **TF-IDF ngram_range**: `(1, 1)`
+- **TF-IDF max_features**: `120000`
+- **TruncatedSVD**: enabled `True`, requested `n_components=1024`
+- **XGBoost n_estimators**: `971`
+- **XGBoost learning_rate**: `0.045325359791945935`
+- **XGBoost max_depth**: `7`
+## Dataset Size (training samples)
+Prepared turn-level samples: 1611 (unsafe: 119, safe: 1492)

best_threshold.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ 0.20

xgboost_tfidf_model.joblib ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d0843932e96785ae59f1116fc2c4c48174cbae5240e0d1686969980a1629f53f
+size 24083944