--- title: jobin-dsri emoji: ๐Ÿงช colorFrom: blue colorTo: green sdk: docker app_port: 8000 pinned: false license: apache-2.0 --- # ML Inference Service FastAPI service for serving ML models over HTTP. Comes with ResNet-18 for image classification out of the box, but you can swap in any model you want. ## Quick Start **Install `uv`:** https://docs.astral.sh/uv/getting-started/installation/ **Local development:** ```bash # Install dependencies make setup source venv/bin/activate # Download the example model make download # Run it make serve ``` In a second terminal: ```bash # Process an example input ./prompt.sh cat.json ``` Server runs on `http://127.0.0.1:8000`. Check `/docs` for the interactive API documentation. **Docker:** ```bash # Build make docker-build # Run make docker-run ``` ## Testing the API ```bash # Using curl curl -X POST http://localhost:8000/predict \ -H "Content-Type: application/json" \ -d '{ "image": { "mediaType": "image/jpeg", "data": "" } }' ``` Example response: ```json { "logprobs": [-0.859380304813385,-1.2701971530914307,-2.1918208599090576,-1.69235098361969], "localizationMask": { "mediaType":"image/png", "data":"iVBORw0KGgoAAAANSUhEUgAAA8AAAAKDAQAAAAD9Fl5AAAAAu0lEQVR4nO3NsREAMAgDMWD/nZMVKEwn1T5/FQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMCl3g5f+HC24TRhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAj70gwKsTlmdBwAAAABJRU5ErkJggg==" } } ``` ## Project Structure ``` example-submission/ โ”œโ”€โ”€ main.py # Entry point โ”œโ”€โ”€ app/ โ”‚ โ”œโ”€โ”€ core/ โ”‚ โ”‚ โ”œโ”€โ”€ app.py # <= INSTANTIATE YOUR DETECTOR HERE โ”‚ โ”‚ โ””โ”€โ”€ logging.py # Logging setup โ”‚ โ”œโ”€โ”€ api/ โ”‚ โ”‚ โ”œโ”€โ”€ models.py # Request/response schemas โ”‚ โ”‚ โ”œโ”€โ”€ controllers.py # Business logic โ”‚ โ”‚ โ””โ”€โ”€ routes/ โ”‚ โ”‚ โ””โ”€โ”€ prediction.py # POST /predict โ”‚ โ””โ”€โ”€ services/ โ”‚ โ”œโ”€โ”€ base.py # <= YOUR DETECTOR IMPLEMENTS THIS INTERFACE โ”‚ โ””โ”€โ”€ inference.py # Example service based on ResNet-18 โ”œโ”€โ”€ models/ โ”‚ โ””โ”€โ”€ microsoft/ โ”‚ โ””โ”€โ”€ resnet-18/ # Model weights and config โ”œโ”€โ”€ scripts/ โ”‚ โ”œโ”€โ”€ model_download.bash โ”‚ โ”œโ”€โ”€ generate_test_datasets.py โ”‚ โ””โ”€โ”€ test_datasets.py โ”œโ”€โ”€ Dockerfile โ”œโ”€โ”€ .env.example # Environment config template โ”œโ”€โ”€ cat.json # An example /predict request object โ”œโ”€โ”€ makefile โ”œโ”€โ”€ prompt.sh # Script that makes a /predict request โ”œโ”€โ”€ requirements.in โ”œโ”€โ”€ requirements.txt โ”œโ”€โ”€ response.json # An example /predict response object โ””โ”€โ”€ ``` ## How to Plug In Your Own Model To integrate your model, implement the `InferenceService` abstract class defined in `app/services/base.py`. You can follow the example implementation in `app/services/inference.py`, which is based on ResNet-18. After implementing the required interface, instantiate your model in the `lifespan()` function in `app/core/app.py`, replacing the `ResNetInferenceService` instance. ### Step 1: Create Your Service Class ```python # app/services/your_model_service.py from app.services.base import InferenceService from app.api.models import ImageRequest, PredictionResponse class YourModelService(InferenceService[ImageRequest, PredictionResponse]): def __init__(self, model_name: str): self.model_name = model_name self.model_path = f"models/{model_name}" self.model = None self._is_loaded = False def load_model(self) -> None: """Load your model here. Called once at startup.""" self.model = load_your_model(self.model_path) self._is_loaded = True def predict(self, request: ImageRequest) -> PredictionResponse: """Actual inference happens here.""" image = decode_base64_image(request.image.data) result = self.model(image) logprobs = ... mask = ... return PredictionResponse( logprobs=logprobs, localizationMask=mask, ) @property def is_loaded(self) -> bool: return self._is_loaded ``` ### Step 2: Register Your Service Open `app/core/app.py` and find the lifespan function: ```python # Change this line: service = ResNetInferenceService(model_name="microsoft/resnet-18") # To this: service = YourModelService(...) ``` That's it. The `/predict` endpoint now serves your model. ### Model Files Put your model files under the `models/` directory: ``` models/ โ””โ”€โ”€ your-org/ โ””โ”€โ”€ your-model/ โ”œโ”€โ”€ config.json โ”œโ”€โ”€ weights.bin โ””โ”€โ”€ (other files) ``` ## Configuration Settings are managed via environment variables or a `.env` file. See `.env.example` for all available options. **Default values:** - `APP_NAME`: "ML Inference Service" - `APP_VERSION`: "0.1.0" - `DEBUG`: false - `HOST`: "0.0.0.0" - `PORT`: 8000 - `MODEL_NAME`: "microsoft/resnet-18" **To customize:** ```bash # Copy the example cp .env.example .env # Edit values vim .env ``` Or set environment variables directly: ```bash export MODEL_NAME="google/vit-base-patch16-224" uvicorn main:app --reload ``` ## Deployment **Development:** ```bash uvicorn main:app --reload ``` **Production:** ```bash gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000 ``` The service runs on CPU by default. For GPU inference, install CUDA-enabled PyTorch and modify your service to move tensors to the GPU device. **Docker:** - Multi-stage build keeps the image small - Runs as non-root user (`appuser`) - Python dependencies installed in user site-packages - Model files baked into the image ## What Happens When You Start the Server ``` INFO: Starting ML Inference Service... INFO: Initializing ResNet service: models/microsoft/resnet-18 INFO: Loading model from models/microsoft/resnet-18 INFO: Model loaded: 1000 classes INFO: Startup completed successfully INFO: Uvicorn running on http://0.0.0.0:8000 ``` If you see "Model directory not found", check that your model files exist at the expected path with the full org/model structure. ## API Reference **Endpoint:** `POST /predict` **Request:** ```json { "image": { "mediaType": "image/jpeg", // or "image/png" "data": "" } } ``` **Response:** ```json { "logprobs": [float], // Log-probabilities of each label "localizationMask": { // [Optional] binary mask "mediaType": "image/png", // Always png "data": "" // Image data } } ``` **Docs:** - Swagger UI: `http://localhost:8000/docs` - ReDoc: `http://localhost:8000/redoc` - OpenAPI JSON: `http://localhost:8000/openapi.json` ## PyArrow Test Datasets We've included a test dataset system for validating your model. It generates 100 standardized test cases covering normal inputs, edge cases, performance benchmarks, and model comparisons. ### Generate Datasets ```bash python scripts/generate_test_datasets.py ``` This creates: - `scripts/test_datasets/*.parquet` - Test data (images, requests, expected responses) - `scripts/test_datasets/*_metadata.json` - Human-readable descriptions - `scripts/test_datasets/datasets_summary.json` - Overview of all datasets ### Run Tests ```bash # Start your service first make serve ``` In another terminal: ```bash # Quick test (5 samples per dataset) python scripts/test_datasets.py --quick # Full validation python scripts/test_datasets.py # Test specific category python scripts/test_datasets.py --category edge_case ``` ### Dataset Categories (25 datasets each) **1. Standard Tests** (`standard_test_*.parquet`) - Normal images: random patterns, shapes, gradients - Common sizes: 224x224, 256x256, 299x299, 384x384 - Formats: JPEG, PNG - Purpose: Baseline validation **2. Edge Cases** (`edge_case_*.parquet`) - Tiny images (32x32, 1x1) - Huge images (2048x2048) - Extreme aspect ratios (1000x50) - Corrupted data, malformed requests - Purpose: Test error handling **3. Performance Benchmarks** (`performance_test_*.parquet`) - Batch sizes: 1, 5, 10, 25, 50, 100 images - Latency and throughput tracking - Purpose: Performance profiling **4. Model Comparisons** (`model_comparison_*.parquet`) - Same inputs across different architectures - Models: ResNet-18/50, ViT, ConvNext, Swin - Purpose: Cross-model benchmarking ### Test Output ``` DATASET TESTING SUMMARY ============================================================ Datasets tested: 100 Successful datasets: 95 Failed datasets: 5 Total samples: 1,247 Overall success rate: 87.3% Test duration: 45.2s Performance: Avg latency: 123.4ms Median latency: 98.7ms p95 latency: 342.1ms Max latency: 2,341.0ms Requests/sec: 27.6 Category breakdown: standard: 25 datasets, 94.2% avg success edge_case: 25 datasets, 76.8% avg success performance: 25 datasets, 91.1% avg success model_comparison: 25 datasets, 89.3% avg success ``` ## Common Issues **Port 8000 already in use:** ```bash # Find what's using it lsof -i :8000 # Or just use a different port uvicorn main:app --port 8080 ``` **Model not loading:** - Check the path: models should be in `models///` - If you're trying to run the example ResNet-based model, make sure you ran `make download` to fetch the model weights. - Check logs for the exact error **Slow inference:** - Inference runs on CPU by default - For GPU: install CUDA PyTorch and modify service to use GPU device - Consider using smaller models or quantization ## License Apache 2.0