Upload folder using huggingface_hub

8bd3ef8 verified 22 days ago

9.73 kB

	---
	license: apache-2.0
	---

	# ML Inference Service

	FastAPI service for serving ML models over HTTP. Comes with ResNet-18 for image classification out of the box, but you can swap in any model you want.

	## Quick Start

	Install `uv`:
	https://docs.astral.sh/uv/getting-started/installation/

	Local development:
	```bash
	# Install dependencies
	make setup
	source venv/bin/activate

	# Download the example model
	make download

	# Run it
	make serve
	```

	In a second terminal:
	```bash
	# Process an example input
	./prompt.sh cat.json
	```

	Server runs on `http://127.0.0.1:8000`. Check `/docs` for the interactive API documentation.

	Docker:
	```bash
	# Build
	make docker-build

	# Run
	make docker-run
	```

	## Testing the API

	```bash
	# Using curl
	curl -X POST http://localhost:8000/predict \
	-H "Content-Type: application/json" \
	-d '{
	"image": {
	"mediaType": "image/jpeg",
	"data": "<base64-encoded-image>"
	}
	}'
	```

	Example response:
	```json
	{
	"logprobs": [-0.859380304813385,-1.2701971530914307,-2.1918208599090576,-1.69235098361969],
	"localizationMask": {
	"mediaType":"image/png",
	"data":"iVBORw0KGgoAAAANSUhEUgAAA8AAAAKDAQAAAAD9Fl5AAAAAu0lEQVR4nO3NsREAMAgDMWD/nZMVKEwn1T5/FQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMCl3g5f+HC24TRhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAj70gwKsTlmdBwAAAABJRU5ErkJggg=="
	}
	}
	```

	## Project Structure

	```
	example-submission/
	├── main.py # Entry point
	├── app/
	│ ├── core/
	│ │ ├── app.py # <= INSTANTIATE YOUR DETECTOR HERE
	│ │ └── logging.py # Logging setup
	│ ├── api/
	│ │ ├── models.py # Request/response schemas
	│ │ ├── controllers.py # Business logic
	│ │ └── routes/
	│ │ └── prediction.py # POST /predict
	│ └── services/
	│ ├── base.py # <= YOUR DETECTOR IMPLEMENTS THIS INTERFACE
	│ └── inference.py # Example service based on ResNet-18
	├── models/
	│ └── microsoft/
	│ └── resnet-18/ # Model weights and config
	├── scripts/
	│ ├── model_download.bash
	│ ├── generate_test_datasets.py
	│ └── test_datasets.py
	├── Dockerfile
	├── .env.example # Environment config template
	├── cat.json # An example /predict request object
	├── makefile
	├── prompt.sh # Script that makes a /predict request
	├── requirements.in
	├── requirements.txt
	├── response.json # An example /predict response object
	└──
	```

	## How to Plug In Your Own Model

	To integrate your model, implement the `InferenceService` abstract class defined in `app/services/base.py`. You can follow the example implementation in `app/services/inference.py`, which is based on ResNet-18. After implementing the required interface, instantiate your model in the `lifespan()` function in `app/core/app.py`, replacing the `ResNetInferenceService` instance.

	### Step 1: Create Your Service Class

	```python
	# app/services/your_model_service.py
	from app.services.base import InferenceService
	from app.api.models import ImageRequest, PredictionResponse

	class YourModelService(InferenceService[ImageRequest, PredictionResponse]):
	def __init__(self, model_name: str):
	self.model_name = model_name
	self.model_path = f"models/{model_name}"
	self.model = None
	self._is_loaded = False

	def load_model(self) -> None:
	"""Load your model here. Called once at startup."""
	self.model = load_your_model(self.model_path)
	self._is_loaded = True

	def predict(self, request: ImageRequest) -> PredictionResponse:
	"""Actual inference happens here."""
	image = decode_base64_image(request.image.data)
	result = self.model(image)

	logprobs = ...
	mask = ...

	return PredictionResponse(
	logprobs=logprobs,
	localizationMask=mask,
	)

	@property
	def is_loaded(self) -> bool:
	return self._is_loaded
	```

	### Step 2: Register Your Service

	Open `app/core/app.py` and find the lifespan function:

	```python
	# Change this line:
	service = ResNetInferenceService(model_name="microsoft/resnet-18")

	# To this:
	service = YourModelService(...)
	```

	That's it. The `/predict` endpoint now serves your model.

	### Model Files

	Put your model files under the `models/` directory:

	```
	models/
	└── your-org/
	└── your-model/
	├── config.json
	├── weights.bin
	└── (other files)
	```

	## Configuration

	Settings are managed via environment variables or a `.env` file. See `.env.example` for all available options.

	Default values:
	- `APP_NAME`: "ML Inference Service"
	- `APP_VERSION`: "0.1.0"
	- `DEBUG`: false
	- `HOST`: "0.0.0.0"
	- `PORT`: 8000
	- `MODEL_NAME`: "microsoft/resnet-18"

	To customize:
	```bash
	# Copy the example
	cp .env.example .env

	# Edit values
	vim .env
	```

	Or set environment variables directly:
	```bash
	export MODEL_NAME="google/vit-base-patch16-224"
	uvicorn main:app --reload
	```

	## Deployment

	Development:
	```bash
	uvicorn main:app --reload
	```

	Production:
	```bash
	gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000
	```

	The service runs on CPU by default. For GPU inference, install CUDA-enabled PyTorch and modify your service to move tensors to the GPU device.

	Docker:
	- Multi-stage build keeps the image small
	- Runs as non-root user (`appuser`)
	- Python dependencies installed in user site-packages
	- Model files baked into the image

	## What Happens When You Start the Server

	```
	INFO: Starting ML Inference Service...
	INFO: Initializing ResNet service: models/microsoft/resnet-18
	INFO: Loading model from models/microsoft/resnet-18
	INFO: Model loaded: 1000 classes
	INFO: Startup completed successfully
	INFO: Uvicorn running on http://0.0.0.0:8000
	```

	If you see "Model directory not found", check that your model files exist at the expected path with the full org/model structure.

	## API Reference

	Endpoint: `POST /predict`

	Request:
	```json
	{
	"image": {
	"mediaType": "image/jpeg", // or "image/png"
	"data": "<base64 string>"
	}
	}
	```

	Response:
	```json
	{
	"logprobs": [float], // Log-probabilities of each label
	"localizationMask": { // [Optional] binary mask
	"mediaType": "image/png", // Always png
	"data": "<base64 string>" // Image data
	}
	}
	```

	Docs:
	- Swagger UI: `http://localhost:8000/docs`
	- ReDoc: `http://localhost:8000/redoc`
	- OpenAPI JSON: `http://localhost:8000/openapi.json`

	## PyArrow Test Datasets

	We've included a test dataset system for validating your model. It generates 100 standardized test cases covering normal inputs, edge cases, performance benchmarks, and model comparisons.

	### Generate Datasets

	```bash
	python scripts/generate_test_datasets.py
	```

	This creates:
	- `scripts/test_datasets/*.parquet` - Test data (images, requests, expected responses)
	- `scripts/test_datasets/*_metadata.json` - Human-readable descriptions
	- `scripts/test_datasets/datasets_summary.json` - Overview of all datasets

	### Run Tests

	```bash
	# Start your service first
	make serve
	```

	In another terminal:

	```bash
	# Quick test (5 samples per dataset)
	python scripts/test_datasets.py --quick

	# Full validation
	python scripts/test_datasets.py

	# Test specific category
	python scripts/test_datasets.py --category edge_case
	```

	### Dataset Categories (25 datasets each)

	1. Standard Tests (`standard_test_*.parquet`)
	- Normal images: random patterns, shapes, gradients
	- Common sizes: 224x224, 256x256, 299x299, 384x384
	- Formats: JPEG, PNG
	- Purpose: Baseline validation

	2. Edge Cases (`edge_case_*.parquet`)
	- Tiny images (32x32, 1x1)
	- Huge images (2048x2048)
	- Extreme aspect ratios (1000x50)
	- Corrupted data, malformed requests
	- Purpose: Test error handling

	3. Performance Benchmarks (`performance_test_*.parquet`)
	- Batch sizes: 1, 5, 10, 25, 50, 100 images
	- Latency and throughput tracking
	- Purpose: Performance profiling

	4. Model Comparisons (`model_comparison_*.parquet`)
	- Same inputs across different architectures
	- Models: ResNet-18/50, ViT, ConvNext, Swin
	- Purpose: Cross-model benchmarking

	### Test Output

	```
	DATASET TESTING SUMMARY
	============================================================
	Datasets tested: 100
	Successful datasets: 95
	Failed datasets: 5
	Total samples: 1,247
	Overall success rate: 87.3%
	Test duration: 45.2s

	Performance:
	Avg latency: 123.4ms
	Median latency: 98.7ms
	p95 latency: 342.1ms
	Max latency: 2,341.0ms
	Requests/sec: 27.6

	Category breakdown:
	standard: 25 datasets, 94.2% avg success
	edge_case: 25 datasets, 76.8% avg success
	performance: 25 datasets, 91.1% avg success
	model_comparison: 25 datasets, 89.3% avg success
	```

	## Common Issues

	Port 8000 already in use:
	```bash
	# Find what's using it
	lsof -i :8000

	# Or just use a different port
	uvicorn main:app --port 8080
	```

	Model not loading:
	- Check the path: models should be in `models/<org>/<model-name>/`
	- If you're trying to run the example ResNet-based model, make sure you ran `make download` to fetch the model weights.
	- Check logs for the exact error

	Slow inference:
	- Inference runs on CPU by default
	- For GPU: install CUDA PyTorch and modify service to use GPU device
	- Consider using smaller models or quantization

	## License

	Apache 2.0