Spaces:
Running
Running
Commit ·
1a4e259
1
Parent(s): f9ac587
Add live URL, architecture doc and usage guide; remove render.yaml
Browse files- ARCHITECTURE.md +81 -0
- README.md +14 -7
- USAGE.md +138 -0
- render.yaml +0 -9
ARCHITECTURE.md
ADDED
|
@@ -0,0 +1,81 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Architecture
|
| 2 |
+
|
| 3 |
+
## Deployment Overview
|
| 4 |
+
|
| 5 |
+
```
|
| 6 |
+
GitHub (source code)
|
| 7 |
+
│
|
| 8 |
+
└─► Hugging Face Spaces (Docker runtime)
|
| 9 |
+
│ builds & runs the FastAPI container
|
| 10 |
+
│
|
| 11 |
+
├─► on startup: pulls model from HF Hub
|
| 12 |
+
│ huggingface.co/cmeneses99/sms-classifier
|
| 13 |
+
│ (model.safetensors, tokenizer, config — ~520MB)
|
| 14 |
+
│
|
| 15 |
+
└─► serves API on port 7860
|
| 16 |
+
https://cmeneses99-sms-classifier-api.hf.space
|
| 17 |
+
|
| 18 |
+
cron-job.org ──GET /health every 10min──► HF Spaces (keep-alive)
|
| 19 |
+
```
|
| 20 |
+
|
| 21 |
+
## Request Flow
|
| 22 |
+
|
| 23 |
+
```
|
| 24 |
+
Client
|
| 25 |
+
│
|
| 26 |
+
▼
|
| 27 |
+
FastAPI (routers/)
|
| 28 |
+
│
|
| 29 |
+
├── pages.py → HTML responses (/, /classify, /classify/batch, /categories)
|
| 30 |
+
├── inference.py → POST /classify, POST /classify/batch
|
| 31 |
+
└── meta.py → GET /health, GET /api/categories
|
| 32 |
+
│
|
| 33 |
+
▼
|
| 34 |
+
services/classifier.py
|
| 35 |
+
│
|
| 36 |
+
├── LRU Cache (cache.py) ──hit──► return cached response
|
| 37 |
+
│
|
| 38 |
+
└── miss ──► model_loader.py (HuggingFace pipeline)
|
| 39 |
+
└── distilbert-base-multilingual-cased (fine-tuned)
|
| 40 |
+
└── top_k=3 predictions → PredictResponse
|
| 41 |
+
```
|
| 42 |
+
|
| 43 |
+
## Model
|
| 44 |
+
|
| 45 |
+
| Detail | Value |
|
| 46 |
+
|---|---|
|
| 47 |
+
| Base model | `distilbert-base-multilingual-cased` |
|
| 48 |
+
| Task | Sequence classification |
|
| 49 |
+
| Categories | 9 |
|
| 50 |
+
| Training data | 3,150 synthetic examples (350/category, ES + EN) |
|
| 51 |
+
| Training | 5 epochs, fine-tuned with HuggingFace Trainer API |
|
| 52 |
+
| Runtime | CPU-only (PyTorch CPU build) |
|
| 53 |
+
| Cache | LRU, max 512 entries, thread-safe |
|
| 54 |
+
|
| 55 |
+
## Project Structure
|
| 56 |
+
|
| 57 |
+
```
|
| 58 |
+
app/
|
| 59 |
+
├── main.py # Lifespan + router registration
|
| 60 |
+
├── model_loader.py # Downloads model from HF Hub on startup
|
| 61 |
+
├── schemas.py # Pydantic v2 request/response models
|
| 62 |
+
├── category_meta.py # Labels, colors, examples per category
|
| 63 |
+
├── cache.py # Thread-safe LRU cache
|
| 64 |
+
├── utils.py # normalize(), read_static()
|
| 65 |
+
├── routers/
|
| 66 |
+
│ ├── pages.py # HTML routes
|
| 67 |
+
│ ├── inference.py # Classification endpoints
|
| 68 |
+
│ └── meta.py # Health + categories endpoints
|
| 69 |
+
├── services/
|
| 70 |
+
│ └── classifier.py # Inference logic with cache integration
|
| 71 |
+
└── static/
|
| 72 |
+
├── home.html
|
| 73 |
+
├── index.html # Single classifier UI
|
| 74 |
+
├── batch.html # Batch classifier UI
|
| 75 |
+
└── categories.html
|
| 76 |
+
training/
|
| 77 |
+
├── config.py
|
| 78 |
+
├── generate_dataset.py
|
| 79 |
+
├── train.py
|
| 80 |
+
└── eval_report.py
|
| 81 |
+
```
|
README.md
CHANGED
|
@@ -12,6 +12,8 @@ pinned: false
|
|
| 12 |
|
| 13 |
API REST para clasificar mensajes SMS en categorías usando **DistilBERT multilingual** con fine-tuning sobre un dataset sintético multilingüe (ES + EN).
|
| 14 |
|
|
|
|
|
|
|
| 15 |
## Categorías
|
| 16 |
|
| 17 |
| Categoría | Descripción |
|
|
@@ -33,7 +35,8 @@ API REST para clasificar mensajes SMS en categorías usando **DistilBERT multili
|
|
| 33 |
- **PyTorch** (CPU-only en producción)
|
| 34 |
- **Pydantic v2** para validación
|
| 35 |
- **Docker** para contenedorización
|
| 36 |
-
- **
|
|
|
|
| 37 |
|
| 38 |
## Estructura del proyecto
|
| 39 |
|
|
@@ -150,11 +153,15 @@ curl -X POST http://localhost:8000/classify/batch \
|
|
| 150 |
}
|
| 151 |
```
|
| 152 |
|
| 153 |
-
## Deploy en
|
| 154 |
|
| 155 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 156 |
|
| 157 |
-
|
| 158 |
-
2. En [Render.com](https://render.com): New → Web Service → conectar el repo
|
| 159 |
-
3. Render detecta `render.yaml` automáticamente
|
| 160 |
-
4. El primer deploy tarda ~5 min (imagen Docker ~700MB)
|
|
|
|
| 12 |
|
| 13 |
API REST para clasificar mensajes SMS en categorías usando **DistilBERT multilingual** con fine-tuning sobre un dataset sintético multilingüe (ES + EN).
|
| 14 |
|
| 15 |
+
**Live demo:** https://cmeneses99-sms-classifier-api.hf.space
|
| 16 |
+
|
| 17 |
## Categorías
|
| 18 |
|
| 19 |
| Categoría | Descripción |
|
|
|
|
| 35 |
- **PyTorch** (CPU-only en producción)
|
| 36 |
- **Pydantic v2** para validación
|
| 37 |
- **Docker** para contenedorización
|
| 38 |
+
- **Hugging Face Spaces** para deployment
|
| 39 |
+
- **Hugging Face Hub** para hosting del modelo
|
| 40 |
|
| 41 |
## Estructura del proyecto
|
| 42 |
|
|
|
|
| 153 |
}
|
| 154 |
```
|
| 155 |
|
| 156 |
+
## Deploy en Hugging Face Spaces
|
| 157 |
|
| 158 |
+
1. Crear un Space en [huggingface.co/new-space](https://huggingface.co/new-space) con SDK: **Docker**
|
| 159 |
+
2. Pushear el código al repo del Space:
|
| 160 |
+
```bash
|
| 161 |
+
git remote add hfspace https://USER:TOKEN@huggingface.co/spaces/USER/SPACE-NAME
|
| 162 |
+
git push hfspace main
|
| 163 |
+
```
|
| 164 |
+
3. HF Spaces detecta el `Dockerfile` automáticamente y hace el build
|
| 165 |
+
4. Al arrancar, el modelo se descarga desde HF Hub (~520MB, solo la primera vez)
|
| 166 |
|
| 167 |
+
El modelo está hosteado en [huggingface.co/cmeneses99/sms-classifier](https://huggingface.co/cmeneses99/sms-classifier).
|
|
|
|
|
|
|
|
|
USAGE.md
ADDED
|
@@ -0,0 +1,138 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Usage Guide
|
| 2 |
+
|
| 3 |
+
Base URL: `https://cmeneses99-sms-classifier-api.hf.space`
|
| 4 |
+
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
## Via Browser (UI)
|
| 8 |
+
|
| 9 |
+
### Home
|
| 10 |
+
Abrí `https://cmeneses99-sms-classifier-api.hf.space` — vas a ver una descripción de la API con todos los endpoints disponibles y ejemplos de respuesta. Desde ahí podés navegar al resto de las vistas con los botones.
|
| 11 |
+
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
### Clasificar un mensaje
|
| 15 |
+
1. Click en **"Clasificador Simple"** desde el home (o navegá directo a `/classify`)
|
| 16 |
+
2. Escribí el mensaje en el campo de texto
|
| 17 |
+
3. Click en **"Clasificar"** o presioná **Enter**
|
| 18 |
+
4. El resultado muestra la categoría detectada, el nivel de confianza y el top 3 de categorías más probables
|
| 19 |
+
5. Si el mismo texto ya fue consultado antes, aparece el badge **"caché activo"**
|
| 20 |
+
|
| 21 |
+
---
|
| 22 |
+
|
| 23 |
+
### Clasificar múltiples mensajes
|
| 24 |
+
1. Click en **"Clasificador por Lotes"** desde el home (o navegá directo a `/classify/batch`)
|
| 25 |
+
2. Escribí un mensaje por línea en el área de texto
|
| 26 |
+
3. El contador en tiempo real te muestra cuántos mensajes cargaste (máx. 50)
|
| 27 |
+
4. Click en **"Clasificar todo"**
|
| 28 |
+
5. Los resultados aparecen uno por uno con su categoría y confianza
|
| 29 |
+
6. En la barra de resumen inferior podés ver cuántos vinieron desde caché
|
| 30 |
+
|
| 31 |
+
---
|
| 32 |
+
|
| 33 |
+
### Ver categorías disponibles
|
| 34 |
+
1. Click en **"Categorías"** desde el home (o navegá directo a `/categories`)
|
| 35 |
+
2. Cada categoría muestra su descripción y un ejemplo en español e inglés
|
| 36 |
+
|
| 37 |
+
---
|
| 38 |
+
|
| 39 |
+
## Via API (curl)
|
| 40 |
+
|
| 41 |
+
### Clasificar un mensaje
|
| 42 |
+
|
| 43 |
+
```bash
|
| 44 |
+
curl -X POST https://cmeneses99-sms-classifier-api.hf.space/classify \
|
| 45 |
+
-H "Content-Type: application/json" \
|
| 46 |
+
-d '{"text": "Tu código OTP es 482910. No lo compartas."}'
|
| 47 |
+
```
|
| 48 |
+
|
| 49 |
+
```json
|
| 50 |
+
{
|
| 51 |
+
"text": "Tu código OTP es 482910. No lo compartas.",
|
| 52 |
+
"prediction": { "category": "otp_verification", "confidence": 0.9821 },
|
| 53 |
+
"top_3": [
|
| 54 |
+
{ "category": "otp_verification", "confidence": 0.9821 },
|
| 55 |
+
{ "category": "security_alert", "confidence": 0.0091 },
|
| 56 |
+
{ "category": "customer_service", "confidence": 0.0044 }
|
| 57 |
+
],
|
| 58 |
+
"cached": false
|
| 59 |
+
}
|
| 60 |
+
```
|
| 61 |
+
|
| 62 |
+
**Límite:** máx. 512 caracteres por mensaje.
|
| 63 |
+
|
| 64 |
+
---
|
| 65 |
+
|
| 66 |
+
### Clasificar múltiples mensajes
|
| 67 |
+
|
| 68 |
+
```bash
|
| 69 |
+
curl -X POST https://cmeneses99-sms-classifier-api.hf.space/classify/batch \
|
| 70 |
+
-H "Content-Type: application/json" \
|
| 71 |
+
-d '{
|
| 72 |
+
"texts": [
|
| 73 |
+
"Se debitó $45.000 en Falabella.",
|
| 74 |
+
"Your package will arrive tomorrow between 2-4pm.",
|
| 75 |
+
"Pay your bill today and avoid penalties."
|
| 76 |
+
]
|
| 77 |
+
}'
|
| 78 |
+
```
|
| 79 |
+
|
| 80 |
+
```json
|
| 81 |
+
{
|
| 82 |
+
"results": [
|
| 83 |
+
{ "text": "Se debitó $45.000 en Falabella.", "prediction": { "category": "transaction", "confidence": 0.97 }, "top_3": [...], "cached": false },
|
| 84 |
+
{ "text": "Your package will arrive tomorrow...", "prediction": { "category": "delivery_logistics", "confidence": 0.95 }, "top_3": [...], "cached": false },
|
| 85 |
+
{ "text": "Pay your bill today...", "prediction": { "category": "billing_reminder", "confidence": 0.91 }, "top_3": [...], "cached": false }
|
| 86 |
+
],
|
| 87 |
+
"total": 3,
|
| 88 |
+
"from_cache": 0
|
| 89 |
+
}
|
| 90 |
+
```
|
| 91 |
+
|
| 92 |
+
**Límite:** máx. 50 mensajes por request.
|
| 93 |
+
|
| 94 |
+
---
|
| 95 |
+
|
| 96 |
+
### Listar categorías
|
| 97 |
+
|
| 98 |
+
```bash
|
| 99 |
+
curl https://cmeneses99-sms-classifier-api.hf.space/api/categories
|
| 100 |
+
```
|
| 101 |
+
|
| 102 |
+
```json
|
| 103 |
+
["transaction", "otp_verification", "promotion_offer", "security_alert",
|
| 104 |
+
"delivery_logistics", "appointment_reminder", "customer_service",
|
| 105 |
+
"spam_advertising", "billing_reminder"]
|
| 106 |
+
```
|
| 107 |
+
|
| 108 |
+
---
|
| 109 |
+
|
| 110 |
+
### Health check
|
| 111 |
+
|
| 112 |
+
```bash
|
| 113 |
+
curl https://cmeneses99-sms-classifier-api.hf.space/health
|
| 114 |
+
```
|
| 115 |
+
|
| 116 |
+
```json
|
| 117 |
+
{
|
| 118 |
+
"status": "ok",
|
| 119 |
+
"model_loaded": true,
|
| 120 |
+
"cache": { "hits": 12, "misses": 5, "hit_rate": 0.71, "size": 5 }
|
| 121 |
+
}
|
| 122 |
+
```
|
| 123 |
+
|
| 124 |
+
---
|
| 125 |
+
|
| 126 |
+
## Categorías
|
| 127 |
+
|
| 128 |
+
| Categoría | Ejemplos |
|
| 129 |
+
|---|---|
|
| 130 |
+
| `transaction` | "Se debitó $45.000 en Falabella" / "Payment of $120 confirmed" |
|
| 131 |
+
| `otp_verification` | "Tu código OTP es 482910" / "Your verification code is 774321" |
|
| 132 |
+
| `promotion_offer` | "30% de descuento este fin de semana" / "Exclusive offer just for you" |
|
| 133 |
+
| `security_alert` | "Acceso no reconocido desde Berlín" / "Failed login attempt detected" |
|
| 134 |
+
| `delivery_logistics` | "Tu pedido está en camino" / "Your package will arrive tomorrow" |
|
| 135 |
+
| `appointment_reminder` | "Recordatorio: cita médica mañana a las 10am" / "Dental appointment confirmed" |
|
| 136 |
+
| `customer_service` | "Tu ticket #4821 fue resuelto" / "Your case has been escalated" |
|
| 137 |
+
| `spam_advertising` | "Ganaste un premio, haz clic aquí" / "You have been selected for a reward" |
|
| 138 |
+
| `billing_reminder` | "Tu factura vence el 15 de mayo" / "Pay your bill today and avoid penalties" |
|
render.yaml
DELETED
|
@@ -1,9 +0,0 @@
|
|
| 1 |
-
services:
|
| 2 |
-
- type: web
|
| 3 |
-
name: sms-classifier-api
|
| 4 |
-
runtime: docker
|
| 5 |
-
plan: starter
|
| 6 |
-
healthCheckPath: /health
|
| 7 |
-
envVars:
|
| 8 |
-
- key: PORT
|
| 9 |
-
value: 8000
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|