Instructions to use JMasr/balidea-attack-detector-es-gl-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use JMasr/balidea-attack-detector-es-gl-v1 with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
balidea-attack-detector-es-gl-v1
Multilingual safeguard classifier fine-tuned with PEFT/LoRA. Part of the balidea-peft campaign 2026-05-10 family of Spanish (es) and Galician (gl) safeguard models for clinical conversational systems.
Status: v1-beta
⚠️ Beta deployment notice
Beta deployment caveat — Galician precision is below the production gate. On the held-out clinical benchmark, attack-positive GL precision is 0.56 at the model's best operating point. ES precision is 0.83 (passes gate). Deploy GL traffic behind deterministic + keyword filters rather than trusting this model standalone. ES traffic is closer to production-ready.
Production decision threshold
Use T = 0.9550 for inference (recalibrated against the held-out
production_benchmark_v2.csv slice with 200-bootstrap stability check).
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
tok = AutoTokenizer.from_pretrained("JMasr/balidea-attack-detector-es-gl-v1")
model = AutoModelForSequenceClassification.from_pretrained("JMasr/balidea-attack-detector-es-gl-v1").eval()
texts = ["your text here"]
with torch.no_grad():
enc = tok(texts, return_tensors="pt", truncation=True, padding=True, max_length=512)
probs = torch.softmax(model(**enc).logits, dim=-1)[:, 1]
predicted_positive = (probs >= 0.9550).tolist()
Threshold 95 % CI from bootstrap: [0.7125–0.9550]
ROC-AUC on full benchmark slice: 0.8509 (threshold-free)
Benchmark v2 holdout metrics (at production threshold)
| Metric | Value |
|---|---|
| F1 | 0.8571 |
| Recall (positive) | 1.0000 |
| Precision (positive) | 0.7500 |
| FPR | 0.2857 |
Own held-out test metrics (at training-time calibrated threshold 0.9875)
| Metric | Value |
|---|---|
| F1 | 0.9470 |
| Recall (positive) | 0.9571 |
| Precision (positive) | 0.8460 |
| ROC-AUC | 0.9930 |
Per-language metrics (own test, calibrated threshold)
| Metric | ES | GL |
|---|---|---|
| F1 | 0.9579 | 0.9362 |
| Recall+ | 0.9586 | 0.9554 |
| Precision+ | 0.8827 | 0.8109 |
| ROC-AUC | 0.9945 | 0.9914 |
Model details
| Field | Value |
|---|---|
| Base model | protect_ai-deberta-v3 |
| Adapter | LoRA rank=32, α=64, dropout=0.1 |
| Target modules | query_proj, key_proj, value_proj, dense |
| Languages | es, gl |
| Loss | cross_entropy |
| Epochs | 20 |
| Learning rate | 0.00015 |
| Seed | 42 |
| Dataset slug | neuralchemy-attack_balidea-malign-attack-es_attack-local-upgrade-pos-vs-neuralchemy-benign_balidea-malign-benign-es_medquad-qa_squad-qa_alpaca-instructions_attack-local-upgrade-neg-es+gl-7b50e61df076 |
Training campaign
This model is part of the 2026-05-10 campaign run. The campaign rebuilt the benchmark from scratch (contamination-free), recovered labels for ~75k Spanish
- Galician translated rows of mental-health text, and used Claude-authored clinical-style native ES/GL seeds. See the project repository for the full campaign report.
License
Apache 2.0 (model artifact). Base model and training data carry their own licenses; consult the upstream sources before commercial deployment.
- Downloads last month
- -