|
|
--- |
|
|
language: es |
|
|
license: mit |
|
|
library_name: transformers |
|
|
tags: |
|
|
- spam-detection |
|
|
- sms |
|
|
- text-classification |
|
|
- beto |
|
|
- bert |
|
|
- spanish |
|
|
- pytorch |
|
|
datasets: |
|
|
- sms_spam |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
- precision |
|
|
- recall |
|
|
base_model: dccuchile/bert-base-spanish-wwm-cased |
|
|
pipeline_tag: text-classification |
|
|
widget: |
|
|
- text: "¡FELICIDADES! Ganaste un premio de $1000. Haz clic aquí para reclamarlo" |
|
|
example_title: "Spam - Premio falso" |
|
|
- text: "¡Increíble! Ha ganado un viaje con todos los gastos pagados a Cancún. Llame al 1-800-VIAJES" |
|
|
example_title: "Spam - Oferta fraudulenta" |
|
|
- text: "URGENTE: Su cuenta ha sido suspendida. Haga clic aquí para reactivarla" |
|
|
example_title: "Spam - Phishing bancario" |
|
|
- text: "Hola mamá, llegaré tarde a casa. Nos vemos en la cena" |
|
|
example_title: "Legítimo - Mensaje familiar" |
|
|
- text: "Buenos días, confirmo la reunión de mañana a las 3pm" |
|
|
example_title: "Legítimo - Mensaje de trabajo" |
|
|
model-index: |
|
|
- name: spamvision-beto |
|
|
results: |
|
|
- task: |
|
|
type: text-classification |
|
|
name: Text Classification |
|
|
dataset: |
|
|
name: Spanish SMS Spam Detection |
|
|
type: sms_spam |
|
|
metrics: |
|
|
- type: accuracy |
|
|
value: 0.962 |
|
|
name: Accuracy |
|
|
- type: f1 |
|
|
value: 0.951 |
|
|
name: F1 Score |
|
|
- type: precision |
|
|
value: 0.948 |
|
|
name: Precision |
|
|
- type: recall |
|
|
value: 0.955 |
|
|
name: Recall |
|
|
--- |
|
|
|
|
|
# 🛡️ SpamVision BETO - Spanish SMS Spam Detector |
|
|
|
|
|
<div align="center"> |
|
|
<img src="https://img.shields.io/badge/Language-Spanish-green" alt="Spanish"> |
|
|
<img src="https://img.shields.io/badge/Accuracy-96.2%25-blue" alt="Accuracy"> |
|
|
<img src="https://img.shields.io/badge/F1--Score-95.1%25-orange" alt="F1"> |
|
|
<img src="https://img.shields.io/badge/License-MIT-yellow" alt="License"> |
|
|
</div> |
|
|
|
|
|
## 📖 Model Description |
|
|
|
|
|
**SpamVision BETO** is a fine-tuned BERT model for Spanish language specifically designed to detect spam SMS messages with high accuracy. Built on top of the [BETO](https://github.com/dccuchile/beto) (BERT trained on Spanish corpus), this model achieves **96.2% accuracy** in distinguishing between legitimate messages and spam. |
|
|
|
|
|
This model is part of the [SpamVision project](https://github.com/tu-usuario/spamvision-api), a hybrid AI system that combines rule-based filtering (AFD) with deep learning for maximum spam detection performance. |
|
|
|
|
|
### Key Features |
|
|
|
|
|
- 🎯 **High Accuracy**: 96.2% on test dataset |
|
|
- ⚡ **Fast Inference**: < 200ms per message |
|
|
- 🇪🇸 **Spanish-optimized**: Fine-tuned on Spanish SMS data |
|
|
- 📱 **SMS-focused**: Optimized for short messages (< 160 characters) |
|
|
- 🔄 **Production-ready**: Used in real-world mobile app |
|
|
|
|
|
### Model Architecture |
|
|
|
|
|
- **Base Model**: `dccuchile/bert-base-spanish-wwm-cased` |
|
|
- **Parameters**: ~110M |
|
|
- **Layers**: 12 transformer encoder layers |
|
|
- **Hidden Size**: 768 |
|
|
- **Max Sequence Length**: 128 tokens |
|
|
- **Vocabulary Size**: 31,002 tokens |
|
|
|
|
|
--- |
|
|
|
|
|
## 🚀 Quick Start |
|
|
|
|
|
### Installation |
|
|
|
|
|
|