Spaces:
Runtime error
Runtime error
File size: 9,927 Bytes
2704eb1 0f42082 2704eb1 0f42082 2704eb1 0f42082 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 |
---
title: SAFE Challenge Example Submission
emoji: π
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: apache-2.0
---
---
license: apache-2.0
---
# ML Inference Service
FastAPI service for serving ML models over HTTP. Comes with ResNet-18 for image classification out of the box, but you can swap in any model you want.
## Quick Start
**Local development:**
```bash
# Install dependencies
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# Download the example model
bash scripts/model_download.bash
# Run it
uvicorn main:app --reload
```
Server runs on `http://127.0.0.1:8000`. Check `/docs` for the interactive API documentation.
**Docker:**
```bash
# Build
docker build -t ml-inference-service:test .
# Run
docker run -d --name ml-inference-test -p 8000:8000 ml-inference-service:test
# Check logs
docker logs -f ml-inference-test
# Stop
docker stop ml-inference-test && docker rm ml-inference-test
```
## Testing the API
```bash
# Using curl
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{
"image": {
"mediaType": "image/jpeg",
"data": "<base64-encoded-image>"
}
}'
```
Example response:
```json
{
"prediction": "tiger cat",
"confidence": 0.394,
"predicted_label": 282,
"model": "microsoft/resnet-18",
"mediaType": "image/jpeg"
}
```
## Project Structure
```
ml-inference-service/
βββ main.py # Entry point
βββ app/
β βββ core/
β β βββ app.py # App factory, config, DI, lifecycle
β β βββ logging.py # Logging setup
β βββ api/
β β βββ models.py # Request/response schemas
β β βββ controllers.py # Business logic
β β βββ routes/
β β βββ prediction.py # POST /predict
β βββ services/
β βββ base.py # Abstract InferenceService class
β βββ inference.py # ResNet implementation
βββ models/
β βββ microsoft/
β βββ resnet-18/ # Model weights and config
βββ scripts/
β βββ model_download.bash
β βββ generate_test_datasets.py
β βββ test_datasets.py
βββ Dockerfile # Multi-stage build
βββ .env.example # Environment config template
βββ requirements.txt
```
The key design decision here is that `app/core/app.py` consolidates everythingβconfig, dependency injection, lifecycle, and the app factory. This avoids the mess of managing global state across multiple files.
## How to Plug In Your Own Model
The whole service is built around one abstract base class: `InferenceService`. Implement it for your model, and everything else just works.
### Step 1: Create Your Service Class
```python
# app/services/your_model_service.py
from app.services.base import InferenceService
from app.api.models import ImageRequest, PredictionResponse
import asyncio
class YourModelService(InferenceService[ImageRequest, PredictionResponse]):
def __init__(self, model_name: str):
self.model_name = model_name
self.model_path = f"models/{model_name}"
self.model = None
self._is_loaded = False
async def load_model(self) -> None:
"""Load your model here. Called once at startup."""
self.model = load_your_model(self.model_path)
self._is_loaded = True
async def predict(self, request: ImageRequest) -> PredictionResponse:
"""Run inference. Offload heavy work to thread pool."""
return await asyncio.to_thread(self._predict_sync, request)
def _predict_sync(self, request: ImageRequest) -> PredictionResponse:
"""Actual inference happens here."""
image = decode_base64_image(request.image.data)
result = self.model(image)
return PredictionResponse(
prediction=result.label,
confidence=result.confidence,
predicted_label=result.class_id,
model=self.model_name,
mediaType=request.image.mediaType
)
@property
def is_loaded(self) -> bool:
return self._is_loaded
```
**Important:** Use `asyncio.to_thread()` to run CPU-heavy inference in a background thread. This keeps the server responsive while your model is working.
### Step 2: Register Your Service
Open `app/core/app.py` and find the lifespan function:
```python
# Change this line:
service = ResNetInferenceService(model_name="microsoft/resnet-18")
# To this:
service = YourModelService(model_name="your-org/your-model")
```
That's it. The `/predict` endpoint now serves your model.
### Model Files
Put your model files under `models/` with the full org/model structure:
```
models/
βββ your-org/
βββ your-model/
βββ config.json
βββ weights.bin
βββ (other files)
```
No renaming, no dropping the org prefixβit just mirrors the Hugging Face structure.
## Configuration
Settings are managed via environment variables or a `.env` file. See `.env.example` for all available options.
**Default values:**
- `APP_NAME`: "ML Inference Service"
- `APP_VERSION`: "0.1.0"
- `DEBUG`: false
- `HOST`: "0.0.0.0"
- `PORT`: 8000
- `MODEL_NAME`: "microsoft/resnet-18"
**To customize:**
```bash
# Copy the example
cp .env.example .env
# Edit values
vim .env
```
Or set environment variables directly:
```bash
export MODEL_NAME="google/vit-base-patch16-224"
uvicorn main:app --reload
```
## Deployment
**Development:**
```bash
uvicorn main:app --reload
```
**Production:**
```bash
gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000
```
The service runs on CPU by default. For GPU inference, install CUDA-enabled PyTorch and modify your service to move tensors to the GPU device.
**Docker:**
- Multi-stage build keeps the image small
- Runs as non-root user (`appuser`)
- Python dependencies installed in user site-packages
- Model files baked into the image
## What Happens When You Start the Server
```
INFO: Starting ML Inference Service...
INFO: Initializing ResNet service: models/microsoft/resnet-18
INFO: Loading model from models/microsoft/resnet-18
INFO: Model loaded: 1000 classes
INFO: Startup completed successfully
INFO: Uvicorn running on http://0.0.0.0:8000
```
If you see "Model directory not found", check that your model files exist at the expected path with the full org/model structure.
## API Reference
**Endpoint:** `POST /predict`
**Request:**
```json
{
"image": {
"mediaType": "image/jpeg", // or "image/png"
"data": "<base64-encoded-image>"
}
}
```
**Response:**
```json
{
"prediction": "string", // Human-readable label
"confidence": 0.0, // Softmax probability
"predicted_label": 0, // Numeric class index
"model": "org/model-name", // Model identifier
"mediaType": "image/jpeg" // Echoed from request
}
```
**Docs:**
- Swagger UI: `http://localhost:8000/docs`
- ReDoc: `http://localhost:8000/redoc`
- OpenAPI JSON: `http://localhost:8000/openapi.json`
## PyArrow Test Datasets
We've included a test dataset system for validating your model. It generates 100 standardized test cases covering normal inputs, edge cases, performance benchmarks, and model comparisons.
### Generate Datasets
```bash
python scripts/generate_test_datasets.py
```
This creates:
- `scripts/test_datasets/*.parquet` - Test data (images, requests, expected responses)
- `scripts/test_datasets/*_metadata.json` - Human-readable descriptions
- `scripts/test_datasets/datasets_summary.json` - Overview of all datasets
### Run Tests
```bash
# Start your service first
uvicorn main:app --reload
# Quick test (5 samples per dataset)
python scripts/test_datasets.py --quick
# Full validation
python scripts/test_datasets.py
# Test specific category
python scripts/test_datasets.py --category edge_case
```
### Dataset Categories (25 datasets each)
**1. Standard Tests** (`standard_test_*.parquet`)
- Normal images: random patterns, shapes, gradients
- Common sizes: 224x224, 256x256, 299x299, 384x384
- Formats: JPEG, PNG
- Purpose: Baseline validation
**2. Edge Cases** (`edge_case_*.parquet`)
- Tiny images (32x32, 1x1)
- Huge images (2048x2048)
- Extreme aspect ratios (1000x50)
- Corrupted data, malformed requests
- Purpose: Test error handling
**3. Performance Benchmarks** (`performance_test_*.parquet`)
- Batch sizes: 1, 5, 10, 25, 50, 100 images
- Latency and throughput tracking
- Purpose: Performance profiling
**4. Model Comparisons** (`model_comparison_*.parquet`)
- Same inputs across different architectures
- Models: ResNet-18/50, ViT, ConvNext, Swin
- Purpose: Cross-model benchmarking
### Test Output
```
DATASET TESTING SUMMARY
============================================================
Datasets tested: 100
Successful datasets: 95
Failed datasets: 5
Total samples: 1,247
Overall success rate: 87.3%
Test duration: 45.2s
Performance:
Avg latency: 123.4ms
Median latency: 98.7ms
p95 latency: 342.1ms
Max latency: 2,341.0ms
Requests/sec: 27.6
Category breakdown:
standard: 25 datasets, 94.2% avg success
edge_case: 25 datasets, 76.8% avg success
performance: 25 datasets, 91.1% avg success
model_comparison: 25 datasets, 89.3% avg success
```
## Common Issues
**Port 8000 already in use:**
```bash
# Find what's using it
lsof -i :8000
# Or just use a different port
uvicorn main:app --port 8080
```
**Model not loading:**
- Check the path: models should be in `models/<org>/<model-name>/`
- Make sure you ran `bash scripts/model_download.bash`
- Check logs for the exact error
**Slow inference:**
- Inference runs on CPU by default
- For GPU: install CUDA PyTorch and modify service to use GPU device
- Consider using smaller models or quantization
## License
Apache 2.0
|