---
license: mit
---

# EdgeSAM - Efficient Segment Anything Model

EdgeSAM is an accelerated variant of the Segment Anything Model (SAM) optimized for edge devices using ONNX Runtime.

## Model Files

- `edge_sam_3x_encoder.onnx` - Image encoder (1024x1024 input)
- `edge_sam_3x_decoder.onnx` - Mask decoder with prompt support

## Usage

### API Request Format

```python
import requests
import base64

# Encode your image
with open("image.jpg", "rb") as f:
    image_b64 = base64.b64encode(f.read()).decode()

# Make request
response = requests.post(
    "https://YOUR-ENDPOINT-URL",
    json={
        "inputs": image_b64,
        "parameters": {
            "point_coords": [[512, 512]],  # Click point in 1024x1024 space
            "point_labels": [1],            # 1 = foreground, 0 = background
            "return_mask_image": True
        }
    }
)

result = response.json()
```

### Response Format

```json
[
  {
    "mask_shape": [1024, 1024],
    "has_object": true,
    "mask": "<base64_encoded_png>"
  }
]
```

### Parameters

- **point_coords**: Array of `[x, y]` coordinates in 1024x1024 space (optional)
- **point_labels**: Array of labels (1=foreground, 0=background) corresponding to points (optional)
- **box_coords**: Bounding box `[x1, y1, x2, y2]` (optional, not yet implemented)
- **return_mask_image**: Return base64-encoded PNG mask (default: `true`)

### Coordinate System

All coordinates should be in **1024x1024** space, regardless of original image size. The handler automatically resizes input images to 1024x1024 before processing.

Example: For a click at the center of any image, use `[512, 512]`.

## Local Testing

```bash
# Install dependencies
pip install -r requirements.txt

# Run test script
python test_handler.py
```

This will create:
- `test_input.png` - Test image with red circle
- `test_output_mask.png` - Generated segmentation mask
- `test_output_overlay.png` - Overlay visualization

## Technical Details

- **Input**: RGB images (auto-resized to 1024x1024)
- **Preprocessing**: Normalized to [0, 1] range (`/ 255.0`)
- **Hardware**: Supports CUDA GPU with automatic CPU fallback
- **Framework**: ONNX Runtime Web compatible