--- language: en tags: - gnn - jailbreak-detection - text-classification model-index: - name: predict_gnn_gemma4_e4b results: - task: type: text-classification name: Jailbreak Detection metrics: - name: F1 type: f1 value: 0.8937 - name: PR-AUC type: pr_auc value: 0.9358 - name: ROC-AUC type: roc_auc value: 0.9513 - name: Precision type: precision value: 0.9589 - name: Recall type: recall value: 0.8379 --- # GNN Jailbreak Prediction Model (gemma4:e4b) Homogeneous GNN classifier for unsafe/jailbreak likelihood in multi-turn conversations. ## Evaluation Results | Metric | Value | |----------------|--------| | F1 | 0.8937 | | PR-AUC | 0.9358 | | ROC-AUC | 0.9513 | | Precision | 0.9589 | | Recall | 0.8379 | | Best Threshold | 0.750 | ## Training Details - **Target model**: `gemma4:e4b` - **Datasets**: harmbench - **Split column**: `goal` - **Seed**: `42` - **Sentence model**: `sentence-transformers/all-MiniLM-L6-v2` - **Hidden channels**: `128` - **Num layers**: `2` - **Dropout**: `0.3` ## Dataset Size (training samples) Prepared turn-level samples: 425