--- language: en tags: - gnn - jailbreak-detection - text-classification model-index: - name: predict_gnn_gemma4_26b results: - task: type: text-classification name: Jailbreak Detection metrics: - name: F1 type: f1 value: 0.8783 - name: PR-AUC type: pr_auc value: 0.9738 - name: ROC-AUC type: roc_auc value: 0.9697 - name: Precision type: precision value: 0.8554 - name: Recall type: recall value: 0.9344 --- # GNN Jailbreak Prediction Model (gemma4:26b) Homogeneous GNN classifier for unsafe/jailbreak likelihood in multi-turn conversations. ## Evaluation Results | Metric | Value | |----------------|--------| | F1 | 0.8783 | | PR-AUC | 0.9738 | | ROC-AUC | 0.9697 | | Precision | 0.8554 | | Recall | 0.9344 | | Best Threshold | 0.310 | ## Training Details - **Target model**: `gemma4:26b` - **Datasets**: harmbench - **Split column**: `goal` - **Seed**: `42` - **Sentence model**: `sentence-transformers/all-MiniLM-L6-v2` - **Hidden channels**: `128` - **Num layers**: `2` - **Dropout**: `0.3` ## Dataset Size (training samples) Prepared turn-level samples: 517