Spaces:

yanp
/

safe-challenge-example

Runtime error

App Files Files Community

safe-challenge-example / README.md

yanp

Upload folder using huggingface_hub

0f42082 verified 2 months ago

preview code

raw

history blame contribute delete

9.93 kB

	---
	title: SAFE Challenge Example Submission
	emoji: 🔒
	colorFrom: blue
	colorTo: purple
	sdk: docker
	pinned: false
	license: apache-2.0
	---

	---
	license: apache-2.0
	---

	# ML Inference Service

	FastAPI service for serving ML models over HTTP. Comes with ResNet-18 for image classification out of the box, but you can swap in any model you want.

	## Quick Start

	Local development:
	```bash
	# Install dependencies
	python -m venv .venv
	source .venv/bin/activate
	pip install -r requirements.txt

	# Download the example model
	bash scripts/model_download.bash

	# Run it
	uvicorn main:app --reload
	```

	Server runs on `http://127.0.0.1:8000`. Check `/docs` for the interactive API documentation.

	Docker:
	```bash
	# Build
	docker build -t ml-inference-service:test .

	# Run
	docker run -d --name ml-inference-test -p 8000:8000 ml-inference-service:test

	# Check logs
	docker logs -f ml-inference-test

	# Stop
	docker stop ml-inference-test && docker rm ml-inference-test
	```

	## Testing the API

	```bash
	# Using curl
	curl -X POST http://localhost:8000/predict \
	-H "Content-Type: application/json" \
	-d '{
	"image": {
	"mediaType": "image/jpeg",
	"data": "<base64-encoded-image>"
	}
	}'
	```

	Example response:
	```json
	{
	"prediction": "tiger cat",
	"confidence": 0.394,
	"predicted_label": 282,
	"model": "microsoft/resnet-18",
	"mediaType": "image/jpeg"
	}
	```

	## Project Structure

	```
	ml-inference-service/
	├── main.py # Entry point
	├── app/
	│ ├── core/
	│ │ ├── app.py # App factory, config, DI, lifecycle
	│ │ └── logging.py # Logging setup
	│ ├── api/
	│ │ ├── models.py # Request/response schemas
	│ │ ├── controllers.py # Business logic
	│ │ └── routes/
	│ │ └── prediction.py # POST /predict
	│ └── services/
	│ ├── base.py # Abstract InferenceService class
	│ └── inference.py # ResNet implementation
	├── models/
	│ └── microsoft/
	│ └── resnet-18/ # Model weights and config
	├── scripts/
	│ ├── model_download.bash
	│ ├── generate_test_datasets.py
	│ └── test_datasets.py
	├── Dockerfile # Multi-stage build
	├── .env.example # Environment config template
	└── requirements.txt
	```

	The key design decision here is that `app/core/app.py` consolidates everything—config, dependency injection, lifecycle, and the app factory. This avoids the mess of managing global state across multiple files.

	## How to Plug In Your Own Model

	The whole service is built around one abstract base class: `InferenceService`. Implement it for your model, and everything else just works.

	### Step 1: Create Your Service Class

	```python
	# app/services/your_model_service.py
	from app.services.base import InferenceService
	from app.api.models import ImageRequest, PredictionResponse
	import asyncio

	class YourModelService(InferenceService[ImageRequest, PredictionResponse]):
	def __init__(self, model_name: str):
	self.model_name = model_name
	self.model_path = f"models/{model_name}"
	self.model = None
	self._is_loaded = False

	async def load_model(self) -> None:
	"""Load your model here. Called once at startup."""
	self.model = load_your_model(self.model_path)
	self._is_loaded = True

	async def predict(self, request: ImageRequest) -> PredictionResponse:
	"""Run inference. Offload heavy work to thread pool."""
	return await asyncio.to_thread(self._predict_sync, request)

	def _predict_sync(self, request: ImageRequest) -> PredictionResponse:
	"""Actual inference happens here."""
	image = decode_base64_image(request.image.data)
	result = self.model(image)

	return PredictionResponse(
	prediction=result.label,
	confidence=result.confidence,
	predicted_label=result.class_id,
	model=self.model_name,
	mediaType=request.image.mediaType
	)

	@property
	def is_loaded(self) -> bool:
	return self._is_loaded
	```

	Important: Use `asyncio.to_thread()` to run CPU-heavy inference in a background thread. This keeps the server responsive while your model is working.

	### Step 2: Register Your Service

	Open `app/core/app.py` and find the lifespan function:

	```python
	# Change this line:
	service = ResNetInferenceService(model_name="microsoft/resnet-18")

	# To this:
	service = YourModelService(model_name="your-org/your-model")
	```

	That's it. The `/predict` endpoint now serves your model.

	### Model Files

	Put your model files under `models/` with the full org/model structure:

	```
	models/
	└── your-org/
	└── your-model/
	├── config.json
	├── weights.bin
	└── (other files)
	```

	No renaming, no dropping the org prefix—it just mirrors the Hugging Face structure.

	## Configuration

	Settings are managed via environment variables or a `.env` file. See `.env.example` for all available options.

	Default values:
	- `APP_NAME`: "ML Inference Service"
	- `APP_VERSION`: "0.1.0"
	- `DEBUG`: false
	- `HOST`: "0.0.0.0"
	- `PORT`: 8000
	- `MODEL_NAME`: "microsoft/resnet-18"

	To customize:
	```bash
	# Copy the example
	cp .env.example .env

	# Edit values
	vim .env
	```

	Or set environment variables directly:
	```bash
	export MODEL_NAME="google/vit-base-patch16-224"
	uvicorn main:app --reload
	```

	## Deployment

	Development:
	```bash
	uvicorn main:app --reload
	```

	Production:
	```bash
	gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000
	```

	The service runs on CPU by default. For GPU inference, install CUDA-enabled PyTorch and modify your service to move tensors to the GPU device.

	Docker:
	- Multi-stage build keeps the image small
	- Runs as non-root user (`appuser`)
	- Python dependencies installed in user site-packages
	- Model files baked into the image

	## What Happens When You Start the Server

	```
	INFO: Starting ML Inference Service...
	INFO: Initializing ResNet service: models/microsoft/resnet-18
	INFO: Loading model from models/microsoft/resnet-18
	INFO: Model loaded: 1000 classes
	INFO: Startup completed successfully
	INFO: Uvicorn running on http://0.0.0.0:8000
	```

	If you see "Model directory not found", check that your model files exist at the expected path with the full org/model structure.

	## API Reference

	Endpoint: `POST /predict`

	Request:
	```json
	{
	"image": {
	"mediaType": "image/jpeg", // or "image/png"
	"data": "<base64-encoded-image>"
	}
	}
	```

	Response:
	```json
	{
	"prediction": "string", // Human-readable label
	"confidence": 0.0, // Softmax probability
	"predicted_label": 0, // Numeric class index
	"model": "org/model-name", // Model identifier
	"mediaType": "image/jpeg" // Echoed from request
	}
	```

	Docs:
	- Swagger UI: `http://localhost:8000/docs`
	- ReDoc: `http://localhost:8000/redoc`
	- OpenAPI JSON: `http://localhost:8000/openapi.json`

	## PyArrow Test Datasets

	We've included a test dataset system for validating your model. It generates 100 standardized test cases covering normal inputs, edge cases, performance benchmarks, and model comparisons.

	### Generate Datasets

	```bash
	python scripts/generate_test_datasets.py
	```

	This creates:
	- `scripts/test_datasets/*.parquet` - Test data (images, requests, expected responses)
	- `scripts/test_datasets/*_metadata.json` - Human-readable descriptions
	- `scripts/test_datasets/datasets_summary.json` - Overview of all datasets

	### Run Tests

	```bash
	# Start your service first
	uvicorn main:app --reload

	# Quick test (5 samples per dataset)
	python scripts/test_datasets.py --quick

	# Full validation
	python scripts/test_datasets.py

	# Test specific category
	python scripts/test_datasets.py --category edge_case
	```

	### Dataset Categories (25 datasets each)

	1. Standard Tests (`standard_test_*.parquet`)
	- Normal images: random patterns, shapes, gradients
	- Common sizes: 224x224, 256x256, 299x299, 384x384
	- Formats: JPEG, PNG
	- Purpose: Baseline validation

	2. Edge Cases (`edge_case_*.parquet`)
	- Tiny images (32x32, 1x1)
	- Huge images (2048x2048)
	- Extreme aspect ratios (1000x50)
	- Corrupted data, malformed requests
	- Purpose: Test error handling

	3. Performance Benchmarks (`performance_test_*.parquet`)
	- Batch sizes: 1, 5, 10, 25, 50, 100 images
	- Latency and throughput tracking
	- Purpose: Performance profiling

	4. Model Comparisons (`model_comparison_*.parquet`)
	- Same inputs across different architectures
	- Models: ResNet-18/50, ViT, ConvNext, Swin
	- Purpose: Cross-model benchmarking

	### Test Output

	```
	DATASET TESTING SUMMARY
	============================================================
	Datasets tested: 100
	Successful datasets: 95
	Failed datasets: 5
	Total samples: 1,247
	Overall success rate: 87.3%
	Test duration: 45.2s

	Performance:
	Avg latency: 123.4ms
	Median latency: 98.7ms
	p95 latency: 342.1ms
	Max latency: 2,341.0ms
	Requests/sec: 27.6

	Category breakdown:
	standard: 25 datasets, 94.2% avg success
	edge_case: 25 datasets, 76.8% avg success
	performance: 25 datasets, 91.1% avg success
	model_comparison: 25 datasets, 89.3% avg success
	```

	## Common Issues

	Port 8000 already in use:
	```bash
	# Find what's using it
	lsof -i :8000

	# Or just use a different port
	uvicorn main:app --port 8080
	```

	Model not loading:
	- Check the path: models should be in `models/<org>/<model-name>/`
	- Make sure you ran `bash scripts/model_download.bash`
	- Check logs for the exact error

	Slow inference:
	- Inference runs on CPU by default
	- For GPU: install CUDA PyTorch and modify service to use GPU device
	- Consider using smaller models or quantization

	## License

	Apache 2.0