sms-classifier-api / README.md
cmeneses99's picture
Rewrite all docs in English
aea087a
metadata
title: SMS Classifier API
emoji: πŸ“±
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false

SMS Classifier API

REST API that classifies SMS messages into operational categories using multilingual DistilBERT fine-tuned on a synthetic bilingual dataset (ES + EN).

Live demo: https://cmeneses99-sms-classifier-api.hf.space

Categories

Category Description
transaction Payment confirmations, debits and transfers
otp_verification One-time codes for identity verification
promotion_offer Discounts, coupons and merchant offers
security_alert Unrecognized access and suspicious activity
delivery_logistics Shipment status and order tracking
appointment_reminder Medical and dental appointment reminders
customer_service Tickets, claims and support updates
spam_advertising Fraudulent messages and misleading advertising
billing_reminder Pending invoices and payment due dates

Tech stack

  • Python 3.11 + FastAPI + Uvicorn
  • DistilBERT (distilbert-base-multilingual-cased) via HuggingFace Transformers
  • PyTorch (CPU-only in production)
  • Pydantic v2 for validation
  • Docker for containerization
  • Hugging Face Spaces for deployment
  • Hugging Face Hub for model hosting

Project structure

app/
Ò”œÒ”€Ò”€ main.py                      # App entry point
Ò”œÒ”€Ò”€ utils.py                     # normalize(), read_static()
Ò”œÒ”€Ò”€ core/                        # Shared infrastructure
Γ’β€β€š   Ò”œÒ”€Ò”€ cache.py                 # Thread-safe LRU cache
Γ’β€β€š   Ò”œÒ”€Ò”€ model_loader.py          # Downloads and loads the model at startup
Γ’β€β€š   Ò”œÒ”€Ò”€ schemas.py               # Pydantic models
Γ’β€β€š   Ò””Ò”€Ò”€ category_meta.py         # Category metadata
Ò”œÒ”€Ò”€ services/
Γ’β€β€š   Ò””Ò”€Ò”€ classifier.py            # Inference logic + LRU cache integration
Ò”œÒ”€Ò”€ api/                         # JSON endpoints
Γ’β€β€š   Ò”œÒ”€Ò”€ inference.py             # POST /classify, POST /classify/batch
Γ’β€β€š   Ò””Ò”€Ò”€ meta.py                  # GET /health, GET /api/categories
Ò”œÒ”€Ò”€ web/                         # HTML endpoints
Γ’β€β€š   Ò””Ò”€Ò”€ pages.py                 # UI routes
Ò””Ò”€Ò”€ templates/                   # HTML files
    Ò”œÒ”€Ò”€ home.html
    Ò”œÒ”€Ò”€ index.html
    Ò”œÒ”€Ò”€ batch.html
    Ò””Ò”€Ò”€ categories.html
training/
Ò”œÒ”€Ò”€ config.py                    # Hyperparameters
Ò”œÒ”€Ò”€ generate_dataset.py          # Generates training/data/sms_dataset.csv
Ò”œÒ”€Ò”€ train.py                     # Fine-tuning script
Ò””Ò”€Ò”€ eval_report.py               # Per-category metrics report

Run locally

Requirements

  • Python 3.11+
  • Trained model in ./model/ (see training section)
# Create virtual environment
python -m venv .venv
.venv\Scripts\activate        # Windows
source .venv/bin/activate     # Linux/Mac

# Install dependencies
pip install -r requirements.txt
pip install torch --index-url https://download.pytorch.org/whl/cpu

# Start API
uvicorn app.main:app --reload

API available at http://localhost:8000

With Docker

docker compose up --build

Train the model

pip install -r requirements-training.txt

cd training
python generate_dataset.py   # generates training/data/sms_dataset.csv
python train.py              # fine-tuning Ò†’ saves model to ./model/
python eval_report.py        # per-category metrics report

Endpoints

Method Route Description
GET / Home with API description
GET /classify Interactive single classifier (UI)
GET /classify/batch Batch classifier (UI)
GET /categories Categories view with examples
POST /classify Classify one message (JSON)
POST /classify/batch Classify multiple messages (JSON)
GET /api/categories List categories (JSON)
GET /health Service status and cache stats

POST /classify

curl -X POST http://localhost:8000/classify \
  -H "Content-Type: application/json" \
  -d '{"text": "Your OTP code is 482910. Do not share it."}'
{
  "text": "Your OTP code is 482910. Do not share it.",
  "prediction": {
    "category": "otp_verification",
    "confidence": 0.9821
  },
  "top_3": [
    { "category": "otp_verification", "confidence": 0.9821 },
    { "category": "security_alert",   "confidence": 0.0091 },
    { "category": "customer_service", "confidence": 0.0044 }
  ],
  "cached": false
}

POST /classify/batch

curl -X POST http://localhost:8000/classify/batch \
  -H "Content-Type: application/json" \
  -d '{"texts": ["Your OTP code is 482910.", "Your card was charged $45 at Amazon."]}'
{
  "results": [...],
  "total": 2,
  "from_cache": 0
}

Deploy on Hugging Face Spaces

  1. Create a Space at huggingface.co/new-space with SDK: Docker
  2. Push the code to the Space repo:
    git remote add hfspace https://USER:TOKEN@huggingface.co/spaces/USER/SPACE-NAME
    git push hfspace main
    
  3. HF Spaces detects the Dockerfile automatically and builds the image
  4. On startup, the model is downloaded from HF Hub (~520MB, first time only)

Model hosted at huggingface.co/cmeneses99/sms-classifier.