Shahidmuneer's picture
Upload folder using huggingface_hub
8bd3ef8 verified
---
license: apache-2.0
---
# ML Inference Service
FastAPI service for serving ML models over HTTP. Comes with ResNet-18 for image classification out of the box, but you can swap in any model you want.
## Quick Start
**Install `uv`:**
https://docs.astral.sh/uv/getting-started/installation/
**Local development:**
```bash
# Install dependencies
make setup
source venv/bin/activate
# Download the example model
make download
# Run it
make serve
```
In a second terminal:
```bash
# Process an example input
./prompt.sh cat.json
```
Server runs on `http://127.0.0.1:8000`. Check `/docs` for the interactive API documentation.
**Docker:**
```bash
# Build
make docker-build
# Run
make docker-run
```
## Testing the API
```bash
# Using curl
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{
"image": {
"mediaType": "image/jpeg",
"data": "<base64-encoded-image>"
}
}'
```
Example response:
```json
{
"logprobs": [-0.859380304813385,-1.2701971530914307,-2.1918208599090576,-1.69235098361969],
"localizationMask": {
"mediaType":"image/png",
"data":"iVBORw0KGgoAAAANSUhEUgAAA8AAAAKDAQAAAAD9Fl5AAAAAu0lEQVR4nO3NsREAMAgDMWD/nZMVKEwn1T5/FQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMCl3g5f+HC24TRhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAj70gwKsTlmdBwAAAABJRU5ErkJggg=="
}
}
```
## Project Structure
```
example-submission/
β”œβ”€β”€ main.py # Entry point
β”œβ”€β”€ app/
β”‚ β”œβ”€β”€ core/
β”‚ β”‚ β”œβ”€β”€ app.py # <= INSTANTIATE YOUR DETECTOR HERE
β”‚ β”‚ └── logging.py # Logging setup
β”‚ β”œβ”€β”€ api/
β”‚ β”‚ β”œβ”€β”€ models.py # Request/response schemas
β”‚ β”‚ β”œβ”€β”€ controllers.py # Business logic
β”‚ β”‚ └── routes/
β”‚ β”‚ └── prediction.py # POST /predict
β”‚ └── services/
β”‚ β”œβ”€β”€ base.py # <= YOUR DETECTOR IMPLEMENTS THIS INTERFACE
β”‚ └── inference.py # Example service based on ResNet-18
β”œβ”€β”€ models/
β”‚ └── microsoft/
β”‚ └── resnet-18/ # Model weights and config
β”œβ”€β”€ scripts/
β”‚ β”œβ”€β”€ model_download.bash
β”‚ β”œβ”€β”€ generate_test_datasets.py
β”‚ └── test_datasets.py
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ .env.example # Environment config template
β”œβ”€β”€ cat.json # An example /predict request object
β”œβ”€β”€ makefile
β”œβ”€β”€ prompt.sh # Script that makes a /predict request
β”œβ”€β”€ requirements.in
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ response.json # An example /predict response object
└──
```
## How to Plug In Your Own Model
To integrate your model, implement the `InferenceService` abstract class defined in `app/services/base.py`. You can follow the example implementation in `app/services/inference.py`, which is based on ResNet-18. After implementing the required interface, instantiate your model in the `lifespan()` function in `app/core/app.py`, replacing the `ResNetInferenceService` instance.
### Step 1: Create Your Service Class
```python
# app/services/your_model_service.py
from app.services.base import InferenceService
from app.api.models import ImageRequest, PredictionResponse
class YourModelService(InferenceService[ImageRequest, PredictionResponse]):
def __init__(self, model_name: str):
self.model_name = model_name
self.model_path = f"models/{model_name}"
self.model = None
self._is_loaded = False
def load_model(self) -> None:
"""Load your model here. Called once at startup."""
self.model = load_your_model(self.model_path)
self._is_loaded = True
def predict(self, request: ImageRequest) -> PredictionResponse:
"""Actual inference happens here."""
image = decode_base64_image(request.image.data)
result = self.model(image)
logprobs = ...
mask = ...
return PredictionResponse(
logprobs=logprobs,
localizationMask=mask,
)
@property
def is_loaded(self) -> bool:
return self._is_loaded
```
### Step 2: Register Your Service
Open `app/core/app.py` and find the lifespan function:
```python
# Change this line:
service = ResNetInferenceService(model_name="microsoft/resnet-18")
# To this:
service = YourModelService(...)
```
That's it. The `/predict` endpoint now serves your model.
### Model Files
Put your model files under the `models/` directory:
```
models/
└── your-org/
└── your-model/
β”œβ”€β”€ config.json
β”œβ”€β”€ weights.bin
└── (other files)
```
## Configuration
Settings are managed via environment variables or a `.env` file. See `.env.example` for all available options.
**Default values:**
- `APP_NAME`: "ML Inference Service"
- `APP_VERSION`: "0.1.0"
- `DEBUG`: false
- `HOST`: "0.0.0.0"
- `PORT`: 8000
- `MODEL_NAME`: "microsoft/resnet-18"
**To customize:**
```bash
# Copy the example
cp .env.example .env
# Edit values
vim .env
```
Or set environment variables directly:
```bash
export MODEL_NAME="google/vit-base-patch16-224"
uvicorn main:app --reload
```
## Deployment
**Development:**
```bash
uvicorn main:app --reload
```
**Production:**
```bash
gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000
```
The service runs on CPU by default. For GPU inference, install CUDA-enabled PyTorch and modify your service to move tensors to the GPU device.
**Docker:**
- Multi-stage build keeps the image small
- Runs as non-root user (`appuser`)
- Python dependencies installed in user site-packages
- Model files baked into the image
## What Happens When You Start the Server
```
INFO: Starting ML Inference Service...
INFO: Initializing ResNet service: models/microsoft/resnet-18
INFO: Loading model from models/microsoft/resnet-18
INFO: Model loaded: 1000 classes
INFO: Startup completed successfully
INFO: Uvicorn running on http://0.0.0.0:8000
```
If you see "Model directory not found", check that your model files exist at the expected path with the full org/model structure.
## API Reference
**Endpoint:** `POST /predict`
**Request:**
```json
{
"image": {
"mediaType": "image/jpeg", // or "image/png"
"data": "<base64 string>"
}
}
```
**Response:**
```json
{
"logprobs": [float], // Log-probabilities of each label
"localizationMask": { // [Optional] binary mask
"mediaType": "image/png", // Always png
"data": "<base64 string>" // Image data
}
}
```
**Docs:**
- Swagger UI: `http://localhost:8000/docs`
- ReDoc: `http://localhost:8000/redoc`
- OpenAPI JSON: `http://localhost:8000/openapi.json`
## PyArrow Test Datasets
We've included a test dataset system for validating your model. It generates 100 standardized test cases covering normal inputs, edge cases, performance benchmarks, and model comparisons.
### Generate Datasets
```bash
python scripts/generate_test_datasets.py
```
This creates:
- `scripts/test_datasets/*.parquet` - Test data (images, requests, expected responses)
- `scripts/test_datasets/*_metadata.json` - Human-readable descriptions
- `scripts/test_datasets/datasets_summary.json` - Overview of all datasets
### Run Tests
```bash
# Start your service first
make serve
```
In another terminal:
```bash
# Quick test (5 samples per dataset)
python scripts/test_datasets.py --quick
# Full validation
python scripts/test_datasets.py
# Test specific category
python scripts/test_datasets.py --category edge_case
```
### Dataset Categories (25 datasets each)
**1. Standard Tests** (`standard_test_*.parquet`)
- Normal images: random patterns, shapes, gradients
- Common sizes: 224x224, 256x256, 299x299, 384x384
- Formats: JPEG, PNG
- Purpose: Baseline validation
**2. Edge Cases** (`edge_case_*.parquet`)
- Tiny images (32x32, 1x1)
- Huge images (2048x2048)
- Extreme aspect ratios (1000x50)
- Corrupted data, malformed requests
- Purpose: Test error handling
**3. Performance Benchmarks** (`performance_test_*.parquet`)
- Batch sizes: 1, 5, 10, 25, 50, 100 images
- Latency and throughput tracking
- Purpose: Performance profiling
**4. Model Comparisons** (`model_comparison_*.parquet`)
- Same inputs across different architectures
- Models: ResNet-18/50, ViT, ConvNext, Swin
- Purpose: Cross-model benchmarking
### Test Output
```
DATASET TESTING SUMMARY
============================================================
Datasets tested: 100
Successful datasets: 95
Failed datasets: 5
Total samples: 1,247
Overall success rate: 87.3%
Test duration: 45.2s
Performance:
Avg latency: 123.4ms
Median latency: 98.7ms
p95 latency: 342.1ms
Max latency: 2,341.0ms
Requests/sec: 27.6
Category breakdown:
standard: 25 datasets, 94.2% avg success
edge_case: 25 datasets, 76.8% avg success
performance: 25 datasets, 91.1% avg success
model_comparison: 25 datasets, 89.3% avg success
```
## Common Issues
**Port 8000 already in use:**
```bash
# Find what's using it
lsof -i :8000
# Or just use a different port
uvicorn main:app --port 8080
```
**Model not loading:**
- Check the path: models should be in `models/<org>/<model-name>/`
- If you're trying to run the example ResNet-based model, make sure you ran `make download` to fetch the model weights.
- Check logs for the exact error
**Slow inference:**
- Inference runs on CPU by default
- For GPU: install CUDA PyTorch and modify service to use GPU device
- Consider using smaller models or quantization
## License
Apache 2.0