SpamVision / README.md
JavicR22's picture
Update README.md
e723b19 verified
metadata
language: es
license: mit
library_name: transformers
tags:
  - spam-detection
  - sms
  - text-classification
  - beto
  - bert
  - spanish
  - pytorch
datasets:
  - sms_spam
metrics:
  - accuracy
  - f1
  - precision
  - recall
base_model: dccuchile/bert-base-spanish-wwm-cased
pipeline_tag: text-classification
widget:
  - text: ¡FELICIDADES! Ganaste un premio de $1000. Haz clic aquí para reclamarlo
    example_title: Spam - Premio falso
  - text: >-
      ¡Increíble! Ha ganado un viaje con todos los gastos pagados a Cancún.
      Llame al 1-800-VIAJES
    example_title: Spam - Oferta fraudulenta
  - text: 'URGENTE: Su cuenta ha sido suspendida. Haga clic aquí para reactivarla'
    example_title: Spam - Phishing bancario
  - text: Hola mamá, llegaré tarde a casa. Nos vemos en la cena
    example_title: Legítimo - Mensaje familiar
  - text: Buenos días, confirmo la reunión de mañana a las 3pm
    example_title: Legítimo - Mensaje de trabajo
model-index:
  - name: spamvision-beto
    results:
      - task:
          type: text-classification
          name: Text Classification
        dataset:
          name: Spanish SMS Spam Detection
          type: sms_spam
        metrics:
          - type: accuracy
            value: 0.962
            name: Accuracy
          - type: f1
            value: 0.951
            name: F1 Score
          - type: precision
            value: 0.948
            name: Precision
          - type: recall
            value: 0.955
            name: Recall

🛡️ SpamVision BETO - Spanish SMS Spam Detector

Spanish Accuracy F1 License

📖 Model Description

SpamVision BETO is a fine-tuned BERT model for Spanish language specifically designed to detect spam SMS messages with high accuracy. Built on top of the BETO (BERT trained on Spanish corpus), this model achieves 96.2% accuracy in distinguishing between legitimate messages and spam.

This model is part of the SpamVision project, a hybrid AI system that combines rule-based filtering (AFD) with deep learning for maximum spam detection performance.

Key Features

  • 🎯 High Accuracy: 96.2% on test dataset
  • Fast Inference: < 200ms per message
  • 🇪🇸 Spanish-optimized: Fine-tuned on Spanish SMS data
  • 📱 SMS-focused: Optimized for short messages (< 160 characters)
  • 🔄 Production-ready: Used in real-world mobile app

Model Architecture

  • Base Model: dccuchile/bert-base-spanish-wwm-cased
  • Parameters: ~110M
  • Layers: 12 transformer encoder layers
  • Hidden Size: 768
  • Max Sequence Length: 128 tokens
  • Vocabulary Size: 31,002 tokens

🚀 Quick Start

Installation