File size: 2,267 Bytes
0d65d50
 
 
594e70f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
---

license: mit
---


# EdgeSAM - Efficient Segment Anything Model

EdgeSAM is an accelerated variant of the Segment Anything Model (SAM) optimized for edge devices using ONNX Runtime.

## Model Files

- `edge_sam_3x_encoder.onnx` - Image encoder (1024x1024 input)
- `edge_sam_3x_decoder.onnx` - Mask decoder with prompt support

## Usage

### API Request Format

```python

import requests

import base64



# Encode your image

with open("image.jpg", "rb") as f:

    image_b64 = base64.b64encode(f.read()).decode()



# Make request

response = requests.post(

    "https://YOUR-ENDPOINT-URL",

    json={

        "inputs": image_b64,

        "parameters": {

            "point_coords": [[512, 512]],  # Click point in 1024x1024 space

            "point_labels": [1],            # 1 = foreground, 0 = background

            "return_mask_image": True

        }

    }

)



result = response.json()

```

### Response Format

```json

[

  {

    "mask_shape": [1024, 1024],

    "has_object": true,

    "mask": "<base64_encoded_png>"

  }

]

```

### Parameters

- **point_coords**: Array of `[x, y]` coordinates in 1024x1024 space (optional)

- **point_labels**: Array of labels (1=foreground, 0=background) corresponding to points (optional)
- **box_coords**: Bounding box `[x1, y1, x2, y2]` (optional, not yet implemented)

- **return_mask_image**: Return base64-encoded PNG mask (default: `true`)



### Coordinate System



All coordinates should be in **1024x1024** space, regardless of original image size. The handler automatically resizes input images to 1024x1024 before processing.



Example: For a click at the center of any image, use `[512, 512]`.



## Local Testing



```bash

# Install dependencies

pip install -r requirements.txt



# Run test script

python test_handler.py

```



This will create:

- `test_input.png` - Test image with red circle

- `test_output_mask.png` - Generated segmentation mask

- `test_output_overlay.png` - Overlay visualization



## Technical Details



- **Input**: RGB images (auto-resized to 1024x1024)

- **Preprocessing**: Normalized to [0, 1] range (`/ 255.0`)

- **Hardware**: Supports CUDA GPU with automatic CPU fallback

- **Framework**: ONNX Runtime Web compatible