Spaces:
Running
Running
File size: 6,241 Bytes
aea087a d6a2338 f9ac587 aea087a f9ac587 1a4e259 aea087a f9ac587 aea087a f9ac587 aea087a f9ac587 aea087a f9ac587 aea087a f9ac587 aea087a f9ac587 aea087a f9ac587 aea087a f9ac587 aea087a 2ab5d08 f9ac587 aea087a f9ac587 aea087a f9ac587 aea087a f9ac587 aea087a f9ac587 aea087a f9ac587 aea087a f9ac587 aea087a f9ac587 aea087a f9ac587 aea087a f9ac587 aea087a f9ac587 aea087a f9ac587 aea087a f9ac587 aea087a f9ac587 aea087a f9ac587 aea087a 1a4e259 aea087a f9ac587 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 | ο»Ώ---
title: SMS Classifier API
emoji: π±
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
---
# SMS Classifier API
REST API that classifies SMS messages into operational categories using **multilingual DistilBERT** fine-tuned on a synthetic bilingual dataset (ES + EN).
**Live demo:** https://cmeneses99-sms-classifier-api.hf.space
## Categories
| Category | Description |
| ---------------------- | ---------------------------------------------------- |
| `transaction` | Payment confirmations, debits and transfers |
| `otp_verification` | One-time codes for identity verification |
| `promotion_offer` | Discounts, coupons and merchant offers |
| `security_alert` | Unrecognized access and suspicious activity |
| `delivery_logistics` | Shipment status and order tracking |
| `appointment_reminder` | Medical and dental appointment reminders |
| `customer_service` | Tickets, claims and support updates |
| `spam_advertising` | Fraudulent messages and misleading advertising |
| `billing_reminder` | Pending invoices and payment due dates |
## Tech stack
- **Python 3.11** + **FastAPI** + **Uvicorn**
- **DistilBERT** (`distilbert-base-multilingual-cased`) via HuggingFace Transformers
- **PyTorch** (CPU-only in production)
- **Pydantic v2** for validation
- **Docker** for containerization
- **Hugging Face Spaces** for deployment
- **Hugging Face Hub** for model hosting
## Project structure
```
app/
Γ’βΕΓ’ββ¬Γ’ββ¬ main.py # App entry point
Γ’βΕΓ’ββ¬Γ’ββ¬ utils.py # normalize(), read_static()
Γ’βΕΓ’ββ¬Γ’ββ¬ core/ # Shared infrastructure
Γ’ββ Γ’βΕΓ’ββ¬Γ’ββ¬ cache.py # Thread-safe LRU cache
Γ’ββ Γ’βΕΓ’ββ¬Γ’ββ¬ model_loader.py # Downloads and loads the model at startup
Γ’ββ Γ’βΕΓ’ββ¬Γ’ββ¬ schemas.py # Pydantic models
Γ’ββ Γ’ββΓ’ββ¬Γ’ββ¬ category_meta.py # Category metadata
Γ’βΕΓ’ββ¬Γ’ββ¬ services/
Γ’ββ Γ’ββΓ’ββ¬Γ’ββ¬ classifier.py # Inference logic + LRU cache integration
Γ’βΕΓ’ββ¬Γ’ββ¬ api/ # JSON endpoints
Γ’ββ Γ’βΕΓ’ββ¬Γ’ββ¬ inference.py # POST /classify, POST /classify/batch
Γ’ββ Γ’ββΓ’ββ¬Γ’ββ¬ meta.py # GET /health, GET /api/categories
Γ’βΕΓ’ββ¬Γ’ββ¬ web/ # HTML endpoints
Γ’ββ Γ’ββΓ’ββ¬Γ’ββ¬ pages.py # UI routes
Γ’ββΓ’ββ¬Γ’ββ¬ templates/ # HTML files
Γ’βΕΓ’ββ¬Γ’ββ¬ home.html
Γ’βΕΓ’ββ¬Γ’ββ¬ index.html
Γ’βΕΓ’ββ¬Γ’ββ¬ batch.html
Γ’ββΓ’ββ¬Γ’ββ¬ categories.html
training/
Γ’βΕΓ’ββ¬Γ’ββ¬ config.py # Hyperparameters
Γ’βΕΓ’ββ¬Γ’ββ¬ generate_dataset.py # Generates training/data/sms_dataset.csv
Γ’βΕΓ’ββ¬Γ’ββ¬ train.py # Fine-tuning script
Γ’ββΓ’ββ¬Γ’ββ¬ eval_report.py # Per-category metrics report
```
## Run locally
### Requirements
- Python 3.11+
- Trained model in `./model/` (see training section)
```bash
# Create virtual environment
python -m venv .venv
.venv\Scripts\activate # Windows
source .venv/bin/activate # Linux/Mac
# Install dependencies
pip install -r requirements.txt
pip install torch --index-url https://download.pytorch.org/whl/cpu
# Start API
uvicorn app.main:app --reload
```
API available at `http://localhost:8000`
### With Docker
```bash
docker compose up --build
```
## Train the model
```bash
pip install -r requirements-training.txt
cd training
python generate_dataset.py # generates training/data/sms_dataset.csv
python train.py # fine-tuning Γ’β β saves model to ./model/
python eval_report.py # per-category metrics report
```
## Endpoints
| Method | Route | Description |
| ------ | ----------------- | ---------------------------------- |
| `GET` | `/` | Home with API description |
| `GET` | `/classify` | Interactive single classifier (UI) |
| `GET` | `/classify/batch` | Batch classifier (UI) |
| `GET` | `/categories` | Categories view with examples |
| `POST` | `/classify` | Classify one message (JSON) |
| `POST` | `/classify/batch` | Classify multiple messages (JSON) |
| `GET` | `/api/categories` | List categories (JSON) |
| `GET` | `/health` | Service status and cache stats |
### POST /classify
```bash
curl -X POST http://localhost:8000/classify \
-H "Content-Type: application/json" \
-d '{"text": "Your OTP code is 482910. Do not share it."}'
```
```json
{
"text": "Your OTP code is 482910. Do not share it.",
"prediction": {
"category": "otp_verification",
"confidence": 0.9821
},
"top_3": [
{ "category": "otp_verification", "confidence": 0.9821 },
{ "category": "security_alert", "confidence": 0.0091 },
{ "category": "customer_service", "confidence": 0.0044 }
],
"cached": false
}
```
### POST /classify/batch
```bash
curl -X POST http://localhost:8000/classify/batch \
-H "Content-Type: application/json" \
-d '{"texts": ["Your OTP code is 482910.", "Your card was charged $45 at Amazon."]}'
```
```json
{
"results": [...],
"total": 2,
"from_cache": 0
}
```
## Deploy on Hugging Face Spaces
1. Create a Space at [huggingface.co/new-space](https://huggingface.co/new-space) with SDK: **Docker**
2. Push the code to the Space repo:
```bash
git remote add hfspace https://USER:TOKEN@huggingface.co/spaces/USER/SPACE-NAME
git push hfspace main
```
3. HF Spaces detects the `Dockerfile` automatically and builds the image
4. On startup, the model is downloaded from HF Hub (~520MB, first time only)
Model hosted at [huggingface.co/cmeneses99/sms-classifier](https://huggingface.co/cmeneses99/sms-classifier).
|