Spaces:
Runtime error
Runtime error
| title: SAFE Challenge Example Submission | |
| emoji: 🔒 | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: docker | |
| pinned: false | |
| license: apache-2.0 | |
| --- | |
| license: apache-2.0 | |
| --- | |
| # ML Inference Service | |
| FastAPI service for serving ML models over HTTP. Comes with ResNet-18 for image classification out of the box, but you can swap in any model you want. | |
| ## Quick Start | |
| **Local development:** | |
| ```bash | |
| # Install dependencies | |
| python -m venv .venv | |
| source .venv/bin/activate | |
| pip install -r requirements.txt | |
| # Download the example model | |
| bash scripts/model_download.bash | |
| # Run it | |
| uvicorn main:app --reload | |
| ``` | |
| Server runs on `http://127.0.0.1:8000`. Check `/docs` for the interactive API documentation. | |
| **Docker:** | |
| ```bash | |
| # Build | |
| docker build -t ml-inference-service:test . | |
| # Run | |
| docker run -d --name ml-inference-test -p 8000:8000 ml-inference-service:test | |
| # Check logs | |
| docker logs -f ml-inference-test | |
| # Stop | |
| docker stop ml-inference-test && docker rm ml-inference-test | |
| ``` | |
| ## Testing the API | |
| ```bash | |
| # Using curl | |
| curl -X POST http://localhost:8000/predict \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "image": { | |
| "mediaType": "image/jpeg", | |
| "data": "<base64-encoded-image>" | |
| } | |
| }' | |
| ``` | |
| Example response: | |
| ```json | |
| { | |
| "prediction": "tiger cat", | |
| "confidence": 0.394, | |
| "predicted_label": 282, | |
| "model": "microsoft/resnet-18", | |
| "mediaType": "image/jpeg" | |
| } | |
| ``` | |
| ## Project Structure | |
| ``` | |
| ml-inference-service/ | |
| ├── main.py # Entry point | |
| ├── app/ | |
| │ ├── core/ | |
| │ │ ├── app.py # App factory, config, DI, lifecycle | |
| │ │ └── logging.py # Logging setup | |
| │ ├── api/ | |
| │ │ ├── models.py # Request/response schemas | |
| │ │ ├── controllers.py # Business logic | |
| │ │ └── routes/ | |
| │ │ └── prediction.py # POST /predict | |
| │ └── services/ | |
| │ ├── base.py # Abstract InferenceService class | |
| │ └── inference.py # ResNet implementation | |
| ├── models/ | |
| │ └── microsoft/ | |
| │ └── resnet-18/ # Model weights and config | |
| ├── scripts/ | |
| │ ├── model_download.bash | |
| │ ├── generate_test_datasets.py | |
| │ └── test_datasets.py | |
| ├── Dockerfile # Multi-stage build | |
| ├── .env.example # Environment config template | |
| └── requirements.txt | |
| ``` | |
| The key design decision here is that `app/core/app.py` consolidates everything—config, dependency injection, lifecycle, and the app factory. This avoids the mess of managing global state across multiple files. | |
| ## How to Plug In Your Own Model | |
| The whole service is built around one abstract base class: `InferenceService`. Implement it for your model, and everything else just works. | |
| ### Step 1: Create Your Service Class | |
| ```python | |
| # app/services/your_model_service.py | |
| from app.services.base import InferenceService | |
| from app.api.models import ImageRequest, PredictionResponse | |
| import asyncio | |
| class YourModelService(InferenceService[ImageRequest, PredictionResponse]): | |
| def __init__(self, model_name: str): | |
| self.model_name = model_name | |
| self.model_path = f"models/{model_name}" | |
| self.model = None | |
| self._is_loaded = False | |
| async def load_model(self) -> None: | |
| """Load your model here. Called once at startup.""" | |
| self.model = load_your_model(self.model_path) | |
| self._is_loaded = True | |
| async def predict(self, request: ImageRequest) -> PredictionResponse: | |
| """Run inference. Offload heavy work to thread pool.""" | |
| return await asyncio.to_thread(self._predict_sync, request) | |
| def _predict_sync(self, request: ImageRequest) -> PredictionResponse: | |
| """Actual inference happens here.""" | |
| image = decode_base64_image(request.image.data) | |
| result = self.model(image) | |
| return PredictionResponse( | |
| prediction=result.label, | |
| confidence=result.confidence, | |
| predicted_label=result.class_id, | |
| model=self.model_name, | |
| mediaType=request.image.mediaType | |
| ) | |
| @property | |
| def is_loaded(self) -> bool: | |
| return self._is_loaded | |
| ``` | |
| **Important:** Use `asyncio.to_thread()` to run CPU-heavy inference in a background thread. This keeps the server responsive while your model is working. | |
| ### Step 2: Register Your Service | |
| Open `app/core/app.py` and find the lifespan function: | |
| ```python | |
| # Change this line: | |
| service = ResNetInferenceService(model_name="microsoft/resnet-18") | |
| # To this: | |
| service = YourModelService(model_name="your-org/your-model") | |
| ``` | |
| That's it. The `/predict` endpoint now serves your model. | |
| ### Model Files | |
| Put your model files under `models/` with the full org/model structure: | |
| ``` | |
| models/ | |
| └── your-org/ | |
| └── your-model/ | |
| ├── config.json | |
| ├── weights.bin | |
| └── (other files) | |
| ``` | |
| No renaming, no dropping the org prefix—it just mirrors the Hugging Face structure. | |
| ## Configuration | |
| Settings are managed via environment variables or a `.env` file. See `.env.example` for all available options. | |
| **Default values:** | |
| - `APP_NAME`: "ML Inference Service" | |
| - `APP_VERSION`: "0.1.0" | |
| - `DEBUG`: false | |
| - `HOST`: "0.0.0.0" | |
| - `PORT`: 8000 | |
| - `MODEL_NAME`: "microsoft/resnet-18" | |
| **To customize:** | |
| ```bash | |
| # Copy the example | |
| cp .env.example .env | |
| # Edit values | |
| vim .env | |
| ``` | |
| Or set environment variables directly: | |
| ```bash | |
| export MODEL_NAME="google/vit-base-patch16-224" | |
| uvicorn main:app --reload | |
| ``` | |
| ## Deployment | |
| **Development:** | |
| ```bash | |
| uvicorn main:app --reload | |
| ``` | |
| **Production:** | |
| ```bash | |
| gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000 | |
| ``` | |
| The service runs on CPU by default. For GPU inference, install CUDA-enabled PyTorch and modify your service to move tensors to the GPU device. | |
| **Docker:** | |
| - Multi-stage build keeps the image small | |
| - Runs as non-root user (`appuser`) | |
| - Python dependencies installed in user site-packages | |
| - Model files baked into the image | |
| ## What Happens When You Start the Server | |
| ``` | |
| INFO: Starting ML Inference Service... | |
| INFO: Initializing ResNet service: models/microsoft/resnet-18 | |
| INFO: Loading model from models/microsoft/resnet-18 | |
| INFO: Model loaded: 1000 classes | |
| INFO: Startup completed successfully | |
| INFO: Uvicorn running on http://0.0.0.0:8000 | |
| ``` | |
| If you see "Model directory not found", check that your model files exist at the expected path with the full org/model structure. | |
| ## API Reference | |
| **Endpoint:** `POST /predict` | |
| **Request:** | |
| ```json | |
| { | |
| "image": { | |
| "mediaType": "image/jpeg", // or "image/png" | |
| "data": "<base64-encoded-image>" | |
| } | |
| } | |
| ``` | |
| **Response:** | |
| ```json | |
| { | |
| "prediction": "string", // Human-readable label | |
| "confidence": 0.0, // Softmax probability | |
| "predicted_label": 0, // Numeric class index | |
| "model": "org/model-name", // Model identifier | |
| "mediaType": "image/jpeg" // Echoed from request | |
| } | |
| ``` | |
| **Docs:** | |
| - Swagger UI: `http://localhost:8000/docs` | |
| - ReDoc: `http://localhost:8000/redoc` | |
| - OpenAPI JSON: `http://localhost:8000/openapi.json` | |
| ## PyArrow Test Datasets | |
| We've included a test dataset system for validating your model. It generates 100 standardized test cases covering normal inputs, edge cases, performance benchmarks, and model comparisons. | |
| ### Generate Datasets | |
| ```bash | |
| python scripts/generate_test_datasets.py | |
| ``` | |
| This creates: | |
| - `scripts/test_datasets/*.parquet` - Test data (images, requests, expected responses) | |
| - `scripts/test_datasets/*_metadata.json` - Human-readable descriptions | |
| - `scripts/test_datasets/datasets_summary.json` - Overview of all datasets | |
| ### Run Tests | |
| ```bash | |
| # Start your service first | |
| uvicorn main:app --reload | |
| # Quick test (5 samples per dataset) | |
| python scripts/test_datasets.py --quick | |
| # Full validation | |
| python scripts/test_datasets.py | |
| # Test specific category | |
| python scripts/test_datasets.py --category edge_case | |
| ``` | |
| ### Dataset Categories (25 datasets each) | |
| **1. Standard Tests** (`standard_test_*.parquet`) | |
| - Normal images: random patterns, shapes, gradients | |
| - Common sizes: 224x224, 256x256, 299x299, 384x384 | |
| - Formats: JPEG, PNG | |
| - Purpose: Baseline validation | |
| **2. Edge Cases** (`edge_case_*.parquet`) | |
| - Tiny images (32x32, 1x1) | |
| - Huge images (2048x2048) | |
| - Extreme aspect ratios (1000x50) | |
| - Corrupted data, malformed requests | |
| - Purpose: Test error handling | |
| **3. Performance Benchmarks** (`performance_test_*.parquet`) | |
| - Batch sizes: 1, 5, 10, 25, 50, 100 images | |
| - Latency and throughput tracking | |
| - Purpose: Performance profiling | |
| **4. Model Comparisons** (`model_comparison_*.parquet`) | |
| - Same inputs across different architectures | |
| - Models: ResNet-18/50, ViT, ConvNext, Swin | |
| - Purpose: Cross-model benchmarking | |
| ### Test Output | |
| ``` | |
| DATASET TESTING SUMMARY | |
| ============================================================ | |
| Datasets tested: 100 | |
| Successful datasets: 95 | |
| Failed datasets: 5 | |
| Total samples: 1,247 | |
| Overall success rate: 87.3% | |
| Test duration: 45.2s | |
| Performance: | |
| Avg latency: 123.4ms | |
| Median latency: 98.7ms | |
| p95 latency: 342.1ms | |
| Max latency: 2,341.0ms | |
| Requests/sec: 27.6 | |
| Category breakdown: | |
| standard: 25 datasets, 94.2% avg success | |
| edge_case: 25 datasets, 76.8% avg success | |
| performance: 25 datasets, 91.1% avg success | |
| model_comparison: 25 datasets, 89.3% avg success | |
| ``` | |
| ## Common Issues | |
| **Port 8000 already in use:** | |
| ```bash | |
| # Find what's using it | |
| lsof -i :8000 | |
| # Or just use a different port | |
| uvicorn main:app --port 8080 | |
| ``` | |
| **Model not loading:** | |
| - Check the path: models should be in `models/<org>/<model-name>/` | |
| - Make sure you ran `bash scripts/model_download.bash` | |
| - Check logs for the exact error | |
| **Slow inference:** | |
| - Inference runs on CPU by default | |
| - For GPU: install CUDA PyTorch and modify service to use GPU device | |
| - Consider using smaller models or quantization | |
| ## License | |
| Apache 2.0 | |