LocalAI / docs /content /features /object-detection.md

Upload folder using huggingface_hub

0f07ba7 verified 20 days ago

5.75 kB

	+++
	disableToc = false
	title = "🔍 Object detection"
	weight = 13
	url = "/features/object-detection/"
	+++

	LocalAI supports object detection through various backends. This feature allows you to identify and locate objects within images with high accuracy and real-time performance. Currently, [RF-DETR](https://github.com/roboflow/rf-detr) is available as an implementation.

	## Overview

	Object detection in LocalAI is implemented through dedicated backends that can identify and locate objects within images. Each backend provides different capabilities and model architectures.

	Key Features:
	- Real-time object detection
	- High accuracy detection with bounding boxes
	- Support for multiple hardware accelerators (CPU, NVIDIA GPU, Intel GPU, AMD GPU)
	- Structured detection results with confidence scores
	- Easy integration through the `/v1/detection` endpoint

	## Usage

	### Detection Endpoint

	LocalAI provides a dedicated `/v1/detection` endpoint for object detection tasks. This endpoint is specifically designed for object detection and returns structured detection results with bounding boxes and confidence scores.

	### API Reference

	To perform object detection, send a POST request to the `/v1/detection` endpoint:

	```bash
	curl -X POST http://localhost:8080/v1/detection \
	-H "Content-Type: application/json" \
	-d '{
	"model": "rfdetr-base",
	"image": "https://media.roboflow.com/dog.jpeg"
	}'
	```

	### Request Format

	The request body should contain:

	- `model`: The name of the object detection model (e.g., "rfdetr-base")
	- `image`: The image to analyze, which can be:
	- A URL to an image
	- A base64-encoded image

	### Response Format

	The API returns a JSON response with detected objects:

	```json
	{
	"detections": [
	{
	"x": 100.5,
	"y": 150.2,
	"width": 200.0,
	"height": 300.0,
	"confidence": 0.95,
	"class_name": "dog"
	},
	{
	"x": 400.0,
	"y": 200.0,
	"width": 150.0,
	"height": 250.0,
	"confidence": 0.87,
	"class_name": "person"
	}
	]
	}
	```

	Each detection includes:
	- `x`, `y`: Coordinates of the bounding box top-left corner
	- `width`, `height`: Dimensions of the bounding box
	- `confidence`: Detection confidence score (0.0 to 1.0)
	- `class_name`: The detected object class

	## Backends

	### RF-DETR Backend

	The RF-DETR backend is implemented as a Python-based gRPC service that integrates seamlessly with LocalAI. It provides object detection capabilities using the RF-DETR model architecture and supports multiple hardware configurations:

	- CPU: Optimized for CPU inference
	- NVIDIA GPU: CUDA acceleration for NVIDIA GPUs
	- Intel GPU: Intel oneAPI optimization
	- AMD GPU: ROCm acceleration for AMD GPUs
	- NVIDIA Jetson: Optimized for ARM64 NVIDIA Jetson devices

	#### Setup

	1. Using the Model Gallery (Recommended)

	The easiest way to get started is using the model gallery. The `rfdetr-base` model is available in the official LocalAI gallery:

	```bash
	# Install and run the rfdetr-base model
	local-ai run rfdetr-base
	```

	You can also install it through the web interface by navigating to the Models section and searching for "rfdetr-base".

	2. Manual Configuration

	Create a model configuration file in your `models` directory:

	```yaml
	name: rfdetr
	backend: rfdetr
	parameters:
	model: rfdetr-base
	```

	#### Available Models

	Currently, the following model is available in the [Model Gallery]({{%relref "features/model-gallery" %}}):

	- rfdetr-base: Base model with balanced performance and accuracy

	You can browse and install this model through the LocalAI web interface or using the command line.

	## Examples

	### Basic Object Detection

	```bash
	curl -X POST http://localhost:8080/v1/detection \
	-H "Content-Type: application/json" \
	-d '{
	"model": "rfdetr-base",
	"image": "https://example.com/image.jpg"
	}'
	```

	### Base64 Image Detection

	```bash
	base64_image=$(base64 -w 0 image.jpg)
	curl -X POST http://localhost:8080/v1/detection \
	-H "Content-Type: application/json" \
	-d "{
	\"model\": \"rfdetr-base\",
	\"image\": \"data:image/jpeg;base64,$base64_image\"
	}"
	```

	## Troubleshooting

	### Common Issues

	1. Model Loading Errors
	- Ensure the model file is properly downloaded
	- Check available disk space
	- Verify model compatibility with your backend version

	2. Low Detection Accuracy
	- Ensure good image quality and lighting
	- Check if objects are clearly visible
	- Consider using a larger model for better accuracy

	3. Slow Performance
	- Enable GPU acceleration if available
	- Use a smaller model for faster inference
	- Optimize image resolution

	### Debug Mode

	Enable debug logging for troubleshooting:

	```bash
	local-ai run --debug rfdetr-base
	```

	## Object Detection Category

	LocalAI includes a dedicated object-detection category for models and backends that specialize in identifying and locating objects within images. This category currently includes:

	- RF-DETR: Real-time transformer-based object detection

	Additional object detection models and backends will be added to this category in the future. You can filter models by the `object-detection` tag in the model gallery to find all available object detection models.

	## Related Features

	- [🎨 Image generation]({{%relref "features/image-generation" %}}): Generate images with AI
	- [📖 Text generation]({{%relref "features/text-generation" %}}): Generate text with language models
	- [🔍 GPT Vision]({{%relref "features/gpt-vision" %}}): Analyze images with language models
	- [🚀 GPU acceleration]({{%relref "features/GPU-acceleration" %}}): Optimize performance with GPU acceleration