|
|
---
|
|
|
license: mit
|
|
|
---
|
|
|
|
|
|
# EdgeSAM - Efficient Segment Anything Model
|
|
|
|
|
|
EdgeSAM is an accelerated variant of the Segment Anything Model (SAM) optimized for edge devices using ONNX Runtime.
|
|
|
|
|
|
## Model Files
|
|
|
|
|
|
- `edge_sam_3x_encoder.onnx` - Image encoder (1024x1024 input)
|
|
|
- `edge_sam_3x_decoder.onnx` - Mask decoder with prompt support
|
|
|
|
|
|
## Usage
|
|
|
|
|
|
### API Request Format
|
|
|
|
|
|
```python
|
|
|
import requests
|
|
|
import base64
|
|
|
|
|
|
# Encode your image
|
|
|
with open("image.jpg", "rb") as f:
|
|
|
image_b64 = base64.b64encode(f.read()).decode()
|
|
|
|
|
|
# Make request
|
|
|
response = requests.post(
|
|
|
"https://YOUR-ENDPOINT-URL",
|
|
|
json={
|
|
|
"inputs": image_b64,
|
|
|
"parameters": {
|
|
|
"point_coords": [[512, 512]], # Click point in 1024x1024 space
|
|
|
"point_labels": [1], # 1 = foreground, 0 = background
|
|
|
"return_mask_image": True
|
|
|
}
|
|
|
}
|
|
|
)
|
|
|
|
|
|
result = response.json()
|
|
|
```
|
|
|
|
|
|
### Response Format
|
|
|
|
|
|
```json
|
|
|
[
|
|
|
{
|
|
|
"mask_shape": [1024, 1024],
|
|
|
"has_object": true,
|
|
|
"mask": "<base64_encoded_png>"
|
|
|
}
|
|
|
]
|
|
|
```
|
|
|
|
|
|
### Parameters
|
|
|
|
|
|
- **point_coords**: Array of `[x, y]` coordinates in 1024x1024 space (optional)
|
|
|
- **point_labels**: Array of labels (1=foreground, 0=background) corresponding to points (optional)
|
|
|
- **box_coords**: Bounding box `[x1, y1, x2, y2]` (optional, not yet implemented)
|
|
|
- **return_mask_image**: Return base64-encoded PNG mask (default: `true`)
|
|
|
|
|
|
### Coordinate System
|
|
|
|
|
|
All coordinates should be in **1024x1024** space, regardless of original image size. The handler automatically resizes input images to 1024x1024 before processing.
|
|
|
|
|
|
Example: For a click at the center of any image, use `[512, 512]`.
|
|
|
|
|
|
## Local Testing
|
|
|
|
|
|
```bash
|
|
|
# Install dependencies
|
|
|
pip install -r requirements.txt
|
|
|
|
|
|
# Run test script
|
|
|
python test_handler.py
|
|
|
```
|
|
|
|
|
|
This will create:
|
|
|
- `test_input.png` - Test image with red circle
|
|
|
- `test_output_mask.png` - Generated segmentation mask
|
|
|
- `test_output_overlay.png` - Overlay visualization
|
|
|
|
|
|
## Technical Details
|
|
|
|
|
|
- **Input**: RGB images (auto-resized to 1024x1024)
|
|
|
- **Preprocessing**: Normalized to [0, 1] range (`/ 255.0`)
|
|
|
- **Hardware**: Supports CUDA GPU with automatic CPU fallback
|
|
|
- **Framework**: ONNX Runtime Web compatible
|
|
|
|