File size: 2,267 Bytes
0d65d50 594e70f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
---
license: mit
---
# EdgeSAM - Efficient Segment Anything Model
EdgeSAM is an accelerated variant of the Segment Anything Model (SAM) optimized for edge devices using ONNX Runtime.
## Model Files
- `edge_sam_3x_encoder.onnx` - Image encoder (1024x1024 input)
- `edge_sam_3x_decoder.onnx` - Mask decoder with prompt support
## Usage
### API Request Format
```python
import requests
import base64
# Encode your image
with open("image.jpg", "rb") as f:
image_b64 = base64.b64encode(f.read()).decode()
# Make request
response = requests.post(
"https://YOUR-ENDPOINT-URL",
json={
"inputs": image_b64,
"parameters": {
"point_coords": [[512, 512]], # Click point in 1024x1024 space
"point_labels": [1], # 1 = foreground, 0 = background
"return_mask_image": True
}
}
)
result = response.json()
```
### Response Format
```json
[
{
"mask_shape": [1024, 1024],
"has_object": true,
"mask": "<base64_encoded_png>"
}
]
```
### Parameters
- **point_coords**: Array of `[x, y]` coordinates in 1024x1024 space (optional)
- **point_labels**: Array of labels (1=foreground, 0=background) corresponding to points (optional)
- **box_coords**: Bounding box `[x1, y1, x2, y2]` (optional, not yet implemented)
- **return_mask_image**: Return base64-encoded PNG mask (default: `true`)
### Coordinate System
All coordinates should be in **1024x1024** space, regardless of original image size. The handler automatically resizes input images to 1024x1024 before processing.
Example: For a click at the center of any image, use `[512, 512]`.
## Local Testing
```bash
# Install dependencies
pip install -r requirements.txt
# Run test script
python test_handler.py
```
This will create:
- `test_input.png` - Test image with red circle
- `test_output_mask.png` - Generated segmentation mask
- `test_output_overlay.png` - Overlay visualization
## Technical Details
- **Input**: RGB images (auto-resized to 1024x1024)
- **Preprocessing**: Normalized to [0, 1] range (`/ 255.0`)
- **Hardware**: Supports CUDA GPU with automatic CPU fallback
- **Framework**: ONNX Runtime Web compatible
|