---
title: SAFE Challenge Example Submission
emoji: 🔒
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: apache-2.0
---

---
license: apache-2.0
---

# ML Inference Service

FastAPI service for serving ML models over HTTP. Comes with ResNet-18 for image classification out of the box, but you can swap in any model you want.

## Quick Start

**Local development:**
```bash
# Install dependencies
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Download the example model
bash scripts/model_download.bash

# Run it
uvicorn main:app --reload
```

Server runs on `http://127.0.0.1:8000`. Check `/docs` for the interactive API documentation.

**Docker:**
```bash
# Build
docker build -t ml-inference-service:test .

# Run
docker run -d --name ml-inference-test -p 8000:8000 ml-inference-service:test

# Check logs
docker logs -f ml-inference-test

# Stop
docker stop ml-inference-test && docker rm ml-inference-test
```

## Testing the API

```bash
# Using curl
curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{
    "image": {
      "mediaType": "image/jpeg",
      "data": "<base64-encoded-image>"
    }
  }'
```

Example response:
```json
{
  "prediction": "tiger cat",
  "confidence": 0.394,
  "predicted_label": 282,
  "model": "microsoft/resnet-18",
  "mediaType": "image/jpeg"
}
```

## Project Structure

```
ml-inference-service/
├── main.py                      # Entry point
├── app/
│   ├── core/
│   │   ├── app.py               # App factory, config, DI, lifecycle
│   │   └── logging.py           # Logging setup
│   ├── api/
│   │   ├── models.py            # Request/response schemas
│   │   ├── controllers.py       # Business logic
│   │   └── routes/
│   │       └── prediction.py    # POST /predict
│   └── services/
│       ├── base.py              # Abstract InferenceService class
│       └── inference.py         # ResNet implementation
├── models/
│   └── microsoft/
│       └── resnet-18/           # Model weights and config
├── scripts/
│   ├── model_download.bash
│   ├── generate_test_datasets.py
│   └── test_datasets.py
├── Dockerfile                   # Multi-stage build
├── .env.example                 # Environment config template
└── requirements.txt
```

The key design decision here is that `app/core/app.py` consolidates everything—config, dependency injection, lifecycle, and the app factory. This avoids the mess of managing global state across multiple files.

## How to Plug In Your Own Model

The whole service is built around one abstract base class: `InferenceService`. Implement it for your model, and everything else just works.

### Step 1: Create Your Service Class

```python
# app/services/your_model_service.py
from app.services.base import InferenceService
from app.api.models import ImageRequest, PredictionResponse
import asyncio

class YourModelService(InferenceService[ImageRequest, PredictionResponse]):
    def __init__(self, model_name: str):
        self.model_name = model_name
        self.model_path = f"models/{model_name}"
        self.model = None
        self._is_loaded = False

    async def load_model(self) -> None:
        """Load your model here. Called once at startup."""
        self.model = load_your_model(self.model_path)
        self._is_loaded = True

    async def predict(self, request: ImageRequest) -> PredictionResponse:
        """Run inference. Offload heavy work to thread pool."""
        return await asyncio.to_thread(self._predict_sync, request)

    def _predict_sync(self, request: ImageRequest) -> PredictionResponse:
        """Actual inference happens here."""
        image = decode_base64_image(request.image.data)
        result = self.model(image)

        return PredictionResponse(
            prediction=result.label,
            confidence=result.confidence,
            predicted_label=result.class_id,
            model=self.model_name,
            mediaType=request.image.mediaType
        )

    @property
    def is_loaded(self) -> bool:
        return self._is_loaded
```

**Important:** Use `asyncio.to_thread()` to run CPU-heavy inference in a background thread. This keeps the server responsive while your model is working.

### Step 2: Register Your Service

Open `app/core/app.py` and find the lifespan function:

```python
# Change this line:
service = ResNetInferenceService(model_name="microsoft/resnet-18")

# To this:
service = YourModelService(model_name="your-org/your-model")
```

That's it. The `/predict` endpoint now serves your model.

### Model Files

Put your model files under `models/` with the full org/model structure:

```
models/
└── your-org/
    └── your-model/
        ├── config.json
        ├── weights.bin
        └── (other files)
```

No renaming, no dropping the org prefix—it just mirrors the Hugging Face structure.

## Configuration

Settings are managed via environment variables or a `.env` file. See `.env.example` for all available options.

**Default values:**
- `APP_NAME`: "ML Inference Service"
- `APP_VERSION`: "0.1.0"
- `DEBUG`: false
- `HOST`: "0.0.0.0"
- `PORT`: 8000
- `MODEL_NAME`: "microsoft/resnet-18"

**To customize:**
```bash
# Copy the example
cp .env.example .env

# Edit values
vim .env
```

Or set environment variables directly:
```bash
export MODEL_NAME="google/vit-base-patch16-224"
uvicorn main:app --reload
```

## Deployment

**Development:**
```bash
uvicorn main:app --reload
```

**Production:**
```bash
gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000
```

The service runs on CPU by default. For GPU inference, install CUDA-enabled PyTorch and modify your service to move tensors to the GPU device.

**Docker:**
- Multi-stage build keeps the image small
- Runs as non-root user (`appuser`)
- Python dependencies installed in user site-packages
- Model files baked into the image

## What Happens When You Start the Server

```
INFO: Starting ML Inference Service...
INFO: Initializing ResNet service: models/microsoft/resnet-18
INFO: Loading model from models/microsoft/resnet-18
INFO: Model loaded: 1000 classes
INFO: Startup completed successfully
INFO: Uvicorn running on http://0.0.0.0:8000
```

If you see "Model directory not found", check that your model files exist at the expected path with the full org/model structure.

## API Reference

**Endpoint:** `POST /predict`

**Request:**
```json
{
  "image": {
    "mediaType": "image/jpeg",  // or "image/png"
    "data": "<base64-encoded-image>"
  }
}
```

**Response:**
```json
{
  "prediction": "string",      // Human-readable label
  "confidence": 0.0,           // Softmax probability
  "predicted_label": 0,        // Numeric class index
  "model": "org/model-name",   // Model identifier
  "mediaType": "image/jpeg"    // Echoed from request
}
```

**Docs:**
- Swagger UI: `http://localhost:8000/docs`
- ReDoc: `http://localhost:8000/redoc`
- OpenAPI JSON: `http://localhost:8000/openapi.json`

## PyArrow Test Datasets

We've included a test dataset system for validating your model. It generates 100 standardized test cases covering normal inputs, edge cases, performance benchmarks, and model comparisons.

### Generate Datasets

```bash
python scripts/generate_test_datasets.py
```

This creates:
- `scripts/test_datasets/*.parquet` - Test data (images, requests, expected responses)
- `scripts/test_datasets/*_metadata.json` - Human-readable descriptions
- `scripts/test_datasets/datasets_summary.json` - Overview of all datasets

### Run Tests

```bash
# Start your service first
uvicorn main:app --reload

# Quick test (5 samples per dataset)
python scripts/test_datasets.py --quick

# Full validation
python scripts/test_datasets.py

# Test specific category
python scripts/test_datasets.py --category edge_case
```

### Dataset Categories (25 datasets each)

**1. Standard Tests** (`standard_test_*.parquet`)
- Normal images: random patterns, shapes, gradients
- Common sizes: 224x224, 256x256, 299x299, 384x384
- Formats: JPEG, PNG
- Purpose: Baseline validation

**2. Edge Cases** (`edge_case_*.parquet`)
- Tiny images (32x32, 1x1)
- Huge images (2048x2048)
- Extreme aspect ratios (1000x50)
- Corrupted data, malformed requests
- Purpose: Test error handling

**3. Performance Benchmarks** (`performance_test_*.parquet`)
- Batch sizes: 1, 5, 10, 25, 50, 100 images
- Latency and throughput tracking
- Purpose: Performance profiling

**4. Model Comparisons** (`model_comparison_*.parquet`)
- Same inputs across different architectures
- Models: ResNet-18/50, ViT, ConvNext, Swin
- Purpose: Cross-model benchmarking

### Test Output

```
DATASET TESTING SUMMARY
============================================================
Datasets tested: 100
Successful datasets: 95
Failed datasets: 5
Total samples: 1,247
Overall success rate: 87.3%
Test duration: 45.2s

Performance:
  Avg latency: 123.4ms
  Median latency: 98.7ms
  p95 latency: 342.1ms
  Max latency: 2,341.0ms
  Requests/sec: 27.6

Category breakdown:
  standard: 25 datasets, 94.2% avg success
  edge_case: 25 datasets, 76.8% avg success
  performance: 25 datasets, 91.1% avg success
  model_comparison: 25 datasets, 89.3% avg success
```

## Common Issues

**Port 8000 already in use:**
```bash
# Find what's using it
lsof -i :8000

# Or just use a different port
uvicorn main:app --port 8080
```

**Model not loading:**
- Check the path: models should be in `models/<org>/<model-name>/`
- Make sure you ran `bash scripts/model_download.bash`
- Check logs for the exact error

**Slow inference:**
- Inference runs on CPU by default
- For GPU: install CUDA PyTorch and modify service to use GPU device
- Consider using smaller models or quantization

## License

Apache 2.0