metadata
language: es
license: mit
library_name: transformers
tags:
- spam-detection
- sms
- text-classification
- beto
- bert
- spanish
- pytorch
datasets:
- sms_spam
metrics:
- accuracy
- f1
- precision
- recall
base_model: dccuchile/bert-base-spanish-wwm-cased
pipeline_tag: text-classification
widget:
- text: ¡FELICIDADES! Ganaste un premio de $1000. Haz clic aquí para reclamarlo
example_title: Spam - Premio falso
- text: >-
¡Increíble! Ha ganado un viaje con todos los gastos pagados a Cancún.
Llame al 1-800-VIAJES
example_title: Spam - Oferta fraudulenta
- text: 'URGENTE: Su cuenta ha sido suspendida. Haga clic aquí para reactivarla'
example_title: Spam - Phishing bancario
- text: Hola mamá, llegaré tarde a casa. Nos vemos en la cena
example_title: Legítimo - Mensaje familiar
- text: Buenos días, confirmo la reunión de mañana a las 3pm
example_title: Legítimo - Mensaje de trabajo
model-index:
- name: spamvision-beto
results:
- task:
type: text-classification
name: Text Classification
dataset:
name: Spanish SMS Spam Detection
type: sms_spam
metrics:
- type: accuracy
value: 0.962
name: Accuracy
- type: f1
value: 0.951
name: F1 Score
- type: precision
value: 0.948
name: Precision
- type: recall
value: 0.955
name: Recall
🛡️ SpamVision BETO - Spanish SMS Spam Detector
📖 Model Description
SpamVision BETO is a fine-tuned BERT model for Spanish language specifically designed to detect spam SMS messages with high accuracy. Built on top of the BETO (BERT trained on Spanish corpus), this model achieves 96.2% accuracy in distinguishing between legitimate messages and spam.
This model is part of the SpamVision project, a hybrid AI system that combines rule-based filtering (AFD) with deep learning for maximum spam detection performance.
Key Features
- 🎯 High Accuracy: 96.2% on test dataset
- ⚡ Fast Inference: < 200ms per message
- 🇪🇸 Spanish-optimized: Fine-tuned on Spanish SMS data
- 📱 SMS-focused: Optimized for short messages (< 160 characters)
- 🔄 Production-ready: Used in real-world mobile app
Model Architecture
- Base Model:
dccuchile/bert-base-spanish-wwm-cased - Parameters: ~110M
- Layers: 12 transformer encoder layers
- Hidden Size: 768
- Max Sequence Length: 128 tokens
- Vocabulary Size: 31,002 tokens