Spaces:
Running
Running
metadata
title: SMS Classifier API
emoji: π±
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
SMS Classifier API
REST API that classifies SMS messages into operational categories using multilingual DistilBERT fine-tuned on a synthetic bilingual dataset (ES + EN).
Live demo: https://cmeneses99-sms-classifier-api.hf.space
Categories
| Category | Description |
|---|---|
transaction |
Payment confirmations, debits and transfers |
otp_verification |
One-time codes for identity verification |
promotion_offer |
Discounts, coupons and merchant offers |
security_alert |
Unrecognized access and suspicious activity |
delivery_logistics |
Shipment status and order tracking |
appointment_reminder |
Medical and dental appointment reminders |
customer_service |
Tickets, claims and support updates |
spam_advertising |
Fraudulent messages and misleading advertising |
billing_reminder |
Pending invoices and payment due dates |
Tech stack
- Python 3.11 + FastAPI + Uvicorn
- DistilBERT (
distilbert-base-multilingual-cased) via HuggingFace Transformers - PyTorch (CPU-only in production)
- Pydantic v2 for validation
- Docker for containerization
- Hugging Face Spaces for deployment
- Hugging Face Hub for model hosting
Project structure
app/
Γ’βΕΓ’ββ¬Γ’ββ¬ main.py # App entry point
Γ’βΕΓ’ββ¬Γ’ββ¬ utils.py # normalize(), read_static()
Γ’βΕΓ’ββ¬Γ’ββ¬ core/ # Shared infrastructure
Γ’ββ Γ’βΕΓ’ββ¬Γ’ββ¬ cache.py # Thread-safe LRU cache
Γ’ββ Γ’βΕΓ’ββ¬Γ’ββ¬ model_loader.py # Downloads and loads the model at startup
Γ’ββ Γ’βΕΓ’ββ¬Γ’ββ¬ schemas.py # Pydantic models
Γ’ββ Γ’ββΓ’ββ¬Γ’ββ¬ category_meta.py # Category metadata
Γ’βΕΓ’ββ¬Γ’ββ¬ services/
Γ’ββ Γ’ββΓ’ββ¬Γ’ββ¬ classifier.py # Inference logic + LRU cache integration
Γ’βΕΓ’ββ¬Γ’ββ¬ api/ # JSON endpoints
Γ’ββ Γ’βΕΓ’ββ¬Γ’ββ¬ inference.py # POST /classify, POST /classify/batch
Γ’ββ Γ’ββΓ’ββ¬Γ’ββ¬ meta.py # GET /health, GET /api/categories
Γ’βΕΓ’ββ¬Γ’ββ¬ web/ # HTML endpoints
Γ’ββ Γ’ββΓ’ββ¬Γ’ββ¬ pages.py # UI routes
Γ’ββΓ’ββ¬Γ’ββ¬ templates/ # HTML files
Γ’βΕΓ’ββ¬Γ’ββ¬ home.html
Γ’βΕΓ’ββ¬Γ’ββ¬ index.html
Γ’βΕΓ’ββ¬Γ’ββ¬ batch.html
Γ’ββΓ’ββ¬Γ’ββ¬ categories.html
training/
Γ’βΕΓ’ββ¬Γ’ββ¬ config.py # Hyperparameters
Γ’βΕΓ’ββ¬Γ’ββ¬ generate_dataset.py # Generates training/data/sms_dataset.csv
Γ’βΕΓ’ββ¬Γ’ββ¬ train.py # Fine-tuning script
Γ’ββΓ’ββ¬Γ’ββ¬ eval_report.py # Per-category metrics report
Run locally
Requirements
- Python 3.11+
- Trained model in
./model/(see training section)
# Create virtual environment
python -m venv .venv
.venv\Scripts\activate # Windows
source .venv/bin/activate # Linux/Mac
# Install dependencies
pip install -r requirements.txt
pip install torch --index-url https://download.pytorch.org/whl/cpu
# Start API
uvicorn app.main:app --reload
API available at http://localhost:8000
With Docker
docker compose up --build
Train the model
pip install -r requirements-training.txt
cd training
python generate_dataset.py # generates training/data/sms_dataset.csv
python train.py # fine-tuning Γ’β β saves model to ./model/
python eval_report.py # per-category metrics report
Endpoints
| Method | Route | Description |
|---|---|---|
GET |
/ |
Home with API description |
GET |
/classify |
Interactive single classifier (UI) |
GET |
/classify/batch |
Batch classifier (UI) |
GET |
/categories |
Categories view with examples |
POST |
/classify |
Classify one message (JSON) |
POST |
/classify/batch |
Classify multiple messages (JSON) |
GET |
/api/categories |
List categories (JSON) |
GET |
/health |
Service status and cache stats |
POST /classify
curl -X POST http://localhost:8000/classify \
-H "Content-Type: application/json" \
-d '{"text": "Your OTP code is 482910. Do not share it."}'
{
"text": "Your OTP code is 482910. Do not share it.",
"prediction": {
"category": "otp_verification",
"confidence": 0.9821
},
"top_3": [
{ "category": "otp_verification", "confidence": 0.9821 },
{ "category": "security_alert", "confidence": 0.0091 },
{ "category": "customer_service", "confidence": 0.0044 }
],
"cached": false
}
POST /classify/batch
curl -X POST http://localhost:8000/classify/batch \
-H "Content-Type: application/json" \
-d '{"texts": ["Your OTP code is 482910.", "Your card was charged $45 at Amazon."]}'
{
"results": [...],
"total": 2,
"from_cache": 0
}
Deploy on Hugging Face Spaces
- Create a Space at huggingface.co/new-space with SDK: Docker
- Push the code to the Space repo:
git remote add hfspace https://USER:TOKEN@huggingface.co/spaces/USER/SPACE-NAME git push hfspace main - HF Spaces detects the
Dockerfileautomatically and builds the image - On startup, the model is downloaded from HF Hub (~520MB, first time only)
Model hosted at huggingface.co/cmeneses99/sms-classifier.