|
|
--- |
|
|
title: jobin-dsri |
|
|
emoji: 🧪 |
|
|
colorFrom: blue |
|
|
colorTo: green |
|
|
sdk: docker |
|
|
app_port: 8000 |
|
|
pinned: false |
|
|
license: apache-2.0 |
|
|
--- |
|
|
|
|
|
# ML Inference Service |
|
|
|
|
|
FastAPI service for serving ML models over HTTP. Comes with ResNet-18 for image classification out of the box, but you can swap in any model you want. |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
**Install `uv`:** |
|
|
https://docs.astral.sh/uv/getting-started/installation/ |
|
|
|
|
|
**Local development:** |
|
|
```bash |
|
|
# Install dependencies |
|
|
make setup |
|
|
source venv/bin/activate |
|
|
|
|
|
# Download the example model |
|
|
make download |
|
|
|
|
|
# Run it |
|
|
make serve |
|
|
``` |
|
|
|
|
|
In a second terminal: |
|
|
```bash |
|
|
# Process an example input |
|
|
./prompt.sh cat.json |
|
|
``` |
|
|
|
|
|
Server runs on `http://127.0.0.1:8000`. Check `/docs` for the interactive API documentation. |
|
|
|
|
|
**Docker:** |
|
|
```bash |
|
|
# Build |
|
|
make docker-build |
|
|
|
|
|
# Run |
|
|
make docker-run |
|
|
``` |
|
|
|
|
|
## Testing the API |
|
|
|
|
|
```bash |
|
|
# Using curl |
|
|
curl -X POST http://localhost:8000/predict \ |
|
|
-H "Content-Type: application/json" \ |
|
|
-d '{ |
|
|
"image": { |
|
|
"mediaType": "image/jpeg", |
|
|
"data": "<base64-encoded-image>" |
|
|
} |
|
|
}' |
|
|
``` |
|
|
|
|
|
Example response: |
|
|
```json |
|
|
{ |
|
|
"logprobs": [-0.859380304813385,-1.2701971530914307,-2.1918208599090576,-1.69235098361969], |
|
|
"localizationMask": { |
|
|
"mediaType":"image/png", |
|
|
"data":"iVBORw0KGgoAAAANSUhEUgAAA8AAAAKDAQAAAAD9Fl5AAAAAu0lEQVR4nO3NsREAMAgDMWD/nZMVKEwn1T5/FQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMCl3g5f+HC24TRhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAj70gwKsTlmdBwAAAABJRU5ErkJggg==" |
|
|
} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Project Structure |
|
|
|
|
|
``` |
|
|
example-submission/ |
|
|
├── main.py # Entry point |
|
|
├── app/ |
|
|
│ ├── core/ |
|
|
│ │ ├── app.py # <= INSTANTIATE YOUR DETECTOR HERE |
|
|
│ │ └── logging.py # Logging setup |
|
|
│ ├── api/ |
|
|
│ │ ├── models.py # Request/response schemas |
|
|
│ │ ├── controllers.py # Business logic |
|
|
│ │ └── routes/ |
|
|
│ │ └── prediction.py # POST /predict |
|
|
│ └── services/ |
|
|
│ ├── base.py # <= YOUR DETECTOR IMPLEMENTS THIS INTERFACE |
|
|
│ └── inference.py # Example service based on ResNet-18 |
|
|
├── models/ |
|
|
│ └── microsoft/ |
|
|
│ └── resnet-18/ # Model weights and config |
|
|
├── scripts/ |
|
|
│ ├── model_download.bash |
|
|
│ ├── generate_test_datasets.py |
|
|
│ └── test_datasets.py |
|
|
├── Dockerfile |
|
|
├── .env.example # Environment config template |
|
|
├── cat.json # An example /predict request object |
|
|
├── makefile |
|
|
├── prompt.sh # Script that makes a /predict request |
|
|
├── requirements.in |
|
|
├── requirements.txt |
|
|
├── response.json # An example /predict response object |
|
|
└── |
|
|
``` |
|
|
|
|
|
## How to Plug In Your Own Model |
|
|
|
|
|
To integrate your model, implement the `InferenceService` abstract class defined in `app/services/base.py`. You can follow the example implementation in `app/services/inference.py`, which is based on ResNet-18. After implementing the required interface, instantiate your model in the `lifespan()` function in `app/core/app.py`, replacing the `ResNetInferenceService` instance. |
|
|
|
|
|
### Step 1: Create Your Service Class |
|
|
|
|
|
```python |
|
|
# app/services/your_model_service.py |
|
|
from app.services.base import InferenceService |
|
|
from app.api.models import ImageRequest, PredictionResponse |
|
|
|
|
|
class YourModelService(InferenceService[ImageRequest, PredictionResponse]): |
|
|
def __init__(self, model_name: str): |
|
|
self.model_name = model_name |
|
|
self.model_path = f"models/{model_name}" |
|
|
self.model = None |
|
|
self._is_loaded = False |
|
|
|
|
|
def load_model(self) -> None: |
|
|
"""Load your model here. Called once at startup.""" |
|
|
self.model = load_your_model(self.model_path) |
|
|
self._is_loaded = True |
|
|
|
|
|
def predict(self, request: ImageRequest) -> PredictionResponse: |
|
|
"""Actual inference happens here.""" |
|
|
image = decode_base64_image(request.image.data) |
|
|
result = self.model(image) |
|
|
|
|
|
logprobs = ... |
|
|
mask = ... |
|
|
|
|
|
return PredictionResponse( |
|
|
logprobs=logprobs, |
|
|
localizationMask=mask, |
|
|
) |
|
|
|
|
|
@property |
|
|
def is_loaded(self) -> bool: |
|
|
return self._is_loaded |
|
|
``` |
|
|
|
|
|
### Step 2: Register Your Service |
|
|
|
|
|
Open `app/core/app.py` and find the lifespan function: |
|
|
|
|
|
```python |
|
|
# Change this line: |
|
|
service = ResNetInferenceService(model_name="microsoft/resnet-18") |
|
|
|
|
|
# To this: |
|
|
service = YourModelService(...) |
|
|
``` |
|
|
|
|
|
That's it. The `/predict` endpoint now serves your model. |
|
|
|
|
|
### Model Files |
|
|
|
|
|
Put your model files under the `models/` directory: |
|
|
|
|
|
``` |
|
|
models/ |
|
|
└── your-org/ |
|
|
└── your-model/ |
|
|
├── config.json |
|
|
├── weights.bin |
|
|
└── (other files) |
|
|
``` |
|
|
|
|
|
## Configuration |
|
|
|
|
|
Settings are managed via environment variables or a `.env` file. See `.env.example` for all available options. |
|
|
|
|
|
**Default values:** |
|
|
- `APP_NAME`: "ML Inference Service" |
|
|
- `APP_VERSION`: "0.1.0" |
|
|
- `DEBUG`: false |
|
|
- `HOST`: "0.0.0.0" |
|
|
- `PORT`: 8000 |
|
|
- `MODEL_NAME`: "microsoft/resnet-18" |
|
|
|
|
|
**To customize:** |
|
|
```bash |
|
|
# Copy the example |
|
|
cp .env.example .env |
|
|
|
|
|
# Edit values |
|
|
vim .env |
|
|
``` |
|
|
|
|
|
Or set environment variables directly: |
|
|
```bash |
|
|
export MODEL_NAME="google/vit-base-patch16-224" |
|
|
uvicorn main:app --reload |
|
|
``` |
|
|
|
|
|
## Deployment |
|
|
|
|
|
**Development:** |
|
|
```bash |
|
|
uvicorn main:app --reload |
|
|
``` |
|
|
|
|
|
**Production:** |
|
|
```bash |
|
|
gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000 |
|
|
``` |
|
|
|
|
|
The service runs on CPU by default. For GPU inference, install CUDA-enabled PyTorch and modify your service to move tensors to the GPU device. |
|
|
|
|
|
**Docker:** |
|
|
- Multi-stage build keeps the image small |
|
|
- Runs as non-root user (`appuser`) |
|
|
- Python dependencies installed in user site-packages |
|
|
- Model files baked into the image |
|
|
|
|
|
## What Happens When You Start the Server |
|
|
|
|
|
``` |
|
|
INFO: Starting ML Inference Service... |
|
|
INFO: Initializing ResNet service: models/microsoft/resnet-18 |
|
|
INFO: Loading model from models/microsoft/resnet-18 |
|
|
INFO: Model loaded: 1000 classes |
|
|
INFO: Startup completed successfully |
|
|
INFO: Uvicorn running on http://0.0.0.0:8000 |
|
|
``` |
|
|
|
|
|
If you see "Model directory not found", check that your model files exist at the expected path with the full org/model structure. |
|
|
|
|
|
## API Reference |
|
|
|
|
|
**Endpoint:** `POST /predict` |
|
|
|
|
|
**Request:** |
|
|
```json |
|
|
{ |
|
|
"image": { |
|
|
"mediaType": "image/jpeg", // or "image/png" |
|
|
"data": "<base64 string>" |
|
|
} |
|
|
} |
|
|
``` |
|
|
|
|
|
**Response:** |
|
|
```json |
|
|
{ |
|
|
"logprobs": [float], // Log-probabilities of each label |
|
|
"localizationMask": { // [Optional] binary mask |
|
|
"mediaType": "image/png", // Always png |
|
|
"data": "<base64 string>" // Image data |
|
|
} |
|
|
} |
|
|
``` |
|
|
|
|
|
**Docs:** |
|
|
- Swagger UI: `http://localhost:8000/docs` |
|
|
- ReDoc: `http://localhost:8000/redoc` |
|
|
- OpenAPI JSON: `http://localhost:8000/openapi.json` |
|
|
|
|
|
## PyArrow Test Datasets |
|
|
|
|
|
We've included a test dataset system for validating your model. It generates 100 standardized test cases covering normal inputs, edge cases, performance benchmarks, and model comparisons. |
|
|
|
|
|
### Generate Datasets |
|
|
|
|
|
```bash |
|
|
python scripts/generate_test_datasets.py |
|
|
``` |
|
|
|
|
|
This creates: |
|
|
- `scripts/test_datasets/*.parquet` - Test data (images, requests, expected responses) |
|
|
- `scripts/test_datasets/*_metadata.json` - Human-readable descriptions |
|
|
- `scripts/test_datasets/datasets_summary.json` - Overview of all datasets |
|
|
|
|
|
### Run Tests |
|
|
|
|
|
```bash |
|
|
# Start your service first |
|
|
make serve |
|
|
``` |
|
|
|
|
|
In another terminal: |
|
|
|
|
|
```bash |
|
|
# Quick test (5 samples per dataset) |
|
|
python scripts/test_datasets.py --quick |
|
|
|
|
|
# Full validation |
|
|
python scripts/test_datasets.py |
|
|
|
|
|
# Test specific category |
|
|
python scripts/test_datasets.py --category edge_case |
|
|
``` |
|
|
|
|
|
### Dataset Categories (25 datasets each) |
|
|
|
|
|
**1. Standard Tests** (`standard_test_*.parquet`) |
|
|
- Normal images: random patterns, shapes, gradients |
|
|
- Common sizes: 224x224, 256x256, 299x299, 384x384 |
|
|
- Formats: JPEG, PNG |
|
|
- Purpose: Baseline validation |
|
|
|
|
|
**2. Edge Cases** (`edge_case_*.parquet`) |
|
|
- Tiny images (32x32, 1x1) |
|
|
- Huge images (2048x2048) |
|
|
- Extreme aspect ratios (1000x50) |
|
|
- Corrupted data, malformed requests |
|
|
- Purpose: Test error handling |
|
|
|
|
|
**3. Performance Benchmarks** (`performance_test_*.parquet`) |
|
|
- Batch sizes: 1, 5, 10, 25, 50, 100 images |
|
|
- Latency and throughput tracking |
|
|
- Purpose: Performance profiling |
|
|
|
|
|
**4. Model Comparisons** (`model_comparison_*.parquet`) |
|
|
- Same inputs across different architectures |
|
|
- Models: ResNet-18/50, ViT, ConvNext, Swin |
|
|
- Purpose: Cross-model benchmarking |
|
|
|
|
|
### Test Output |
|
|
|
|
|
``` |
|
|
DATASET TESTING SUMMARY |
|
|
============================================================ |
|
|
Datasets tested: 100 |
|
|
Successful datasets: 95 |
|
|
Failed datasets: 5 |
|
|
Total samples: 1,247 |
|
|
Overall success rate: 87.3% |
|
|
Test duration: 45.2s |
|
|
|
|
|
Performance: |
|
|
Avg latency: 123.4ms |
|
|
Median latency: 98.7ms |
|
|
p95 latency: 342.1ms |
|
|
Max latency: 2,341.0ms |
|
|
Requests/sec: 27.6 |
|
|
|
|
|
Category breakdown: |
|
|
standard: 25 datasets, 94.2% avg success |
|
|
edge_case: 25 datasets, 76.8% avg success |
|
|
performance: 25 datasets, 91.1% avg success |
|
|
model_comparison: 25 datasets, 89.3% avg success |
|
|
``` |
|
|
|
|
|
## Common Issues |
|
|
|
|
|
**Port 8000 already in use:** |
|
|
```bash |
|
|
# Find what's using it |
|
|
lsof -i :8000 |
|
|
|
|
|
# Or just use a different port |
|
|
uvicorn main:app --port 8080 |
|
|
``` |
|
|
|
|
|
**Model not loading:** |
|
|
- Check the path: models should be in `models/<org>/<model-name>/` |
|
|
- If you're trying to run the example ResNet-based model, make sure you ran `make download` to fetch the model weights. |
|
|
- Check logs for the exact error |
|
|
|
|
|
**Slow inference:** |
|
|
- Inference runs on CPU by default |
|
|
- For GPU: install CUDA PyTorch and modify service to use GPU device |
|
|
- Consider using smaller models or quantization |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 |
|
|
|