tonyyang2000
/

EdgeSAM

ONNX

Model card Files Files and versions

xet

Community

JingShiang Yang commited on Nov 14, 2025

Commit

594e70f

1 Parent(s): 4194e02

add handler.py

Browse files

Files changed (3) hide show

README.md +84 -0
handler.py +71 -0
requirements.txt +3 -0

README.md CHANGED Viewed

@@ -1,3 +1,87 @@
 ---
 license: mit
 ---

 ---
 license: mit
 ---
+# EdgeSAM - Efficient Segment Anything Model
+EdgeSAM is an accelerated variant of the Segment Anything Model (SAM) optimized for edge devices using ONNX Runtime.
+## Model Files
+- `edge_sam_3x_encoder.onnx` - Image encoder (1024x1024 input)
+- `edge_sam_3x_decoder.onnx` - Mask decoder with prompt support
+## Usage
+### API Request Format
+```python
+import requests
+import base64
+# Encode your image
+with open("image.jpg", "rb") as f:
+    image_b64 = base64.b64encode(f.read()).decode()
+# Make request
+response = requests.post(
+    "https://YOUR-ENDPOINT-URL",
+    json={
+        "inputs": image_b64,
+        "parameters": {
+            "point_coords": [[512, 512]],  # Click point in 1024x1024 space
+            "point_labels": [1],            # 1 = foreground, 0 = background
+            "return_mask_image": True
+        }
+    }
+)
+result = response.json()
+```
+### Response Format
+```json
+[
+  {
+    "mask_shape": [1024, 1024],
+    "has_object": true,
+    "mask": "<base64_encoded_png>"
+  }
+]
+```
+### Parameters
+- **point_coords**: Array of `[x, y]` coordinates in 1024x1024 space (optional)
+- **point_labels**: Array of labels (1=foreground, 0=background) corresponding to points (optional)
+- **box_coords**: Bounding box `[x1, y1, x2, y2]` (optional, not yet implemented)
+- **return_mask_image**: Return base64-encoded PNG mask (default: `true`)
+### Coordinate System
+All coordinates should be in **1024x1024** space, regardless of original image size. The handler automatically resizes input images to 1024x1024 before processing.
+Example: For a click at the center of any image, use `[512, 512]`.
+## Local Testing
+```bash
+# Install dependencies
+pip install -r requirements.txt
+# Run test script
+python test_handler.py
+```
+This will create:
+- `test_input.png` - Test image with red circle
+- `test_output_mask.png` - Generated segmentation mask
+- `test_output_overlay.png` - Overlay visualization
+## Technical Details
+- **Input**: RGB images (auto-resized to 1024x1024)
+- **Preprocessing**: Normalized to [0, 1] range (`/ 255.0`)
+- **Hardware**: Supports CUDA GPU with automatic CPU fallback
+- **Framework**: ONNX Runtime Web compatible

handler.py ADDED Viewed

	@@ -0,0 +1,71 @@

+from typing import Dict, List, Any
+import onnxruntime as ort
+import numpy as np
+from PIL import Image
+import io
+import base64
+import os
+class EndpointHandler:
+    def __init__(self, path=""):
+        model_path = path if path else "."
+        providers = ['CUDAExecutionProvider', 'CPUExecutionProvider']
+        self.encoder = ort.InferenceSession(
+            os.path.join(model_path, "edge_sam_3x_encoder.onnx"),
+            providers=providers
+        )
+        self.decoder = ort.InferenceSession(
+            os.path.join(model_path, "edge_sam_3x_decoder.onnx"),
+            providers=providers
+        )
+    def __call__(self, data: Dict[str, Any]) -> List[Dict[str, Any]]:
+        try:
+            # Parse input
+            inputs = data.get("inputs", data)
+            params = data.get("parameters", {})
+            # Load image
+            if isinstance(inputs, str):
+                image = Image.open(io.BytesIO(base64.b64decode(inputs)))
+            else:
+                image = inputs
+            # Preprocess
+            if image.mode != 'RGB':
+                image = image.convert('RGB')
+            image = image.resize((1024, 1024), Image.BILINEAR)
+            img_array = np.array(image).astype(np.float32) / 255.0
+            img_array = img_array.transpose(2, 0, 1)[np.newaxis, :]
+            # Encode
+            embeddings = self.encoder.run(None, {'image': img_array})[0]
+            # Prepare prompts
+            coords = np.array(params.get("point_coords", [[512, 512]]), dtype=np.float32)
+            labels = np.array(params.get("point_labels", [1]), dtype=np.float32)
+            # Decode
+            masks = self.decoder.run(None, {
+                'image_embeddings': embeddings,
+                'point_coords': coords.reshape(1, -1, 2),
+                'point_labels': labels.reshape(1, -1)
+            })[0]
+            # Postprocess
+            mask = (masks[0, 0] > 0.0).astype(np.uint8) * 255
+            # Return result
+            result = {"mask_shape": list(mask.shape), "has_object": bool(mask.max() > 0)}
+            if params.get("return_mask_image", True):
+                buffer = io.BytesIO()
+                Image.fromarray(mask, mode='L').save(buffer, format='PNG')
+                result["mask"] = base64.b64encode(buffer.getvalue()).decode()
+            return [result]
+        except Exception as e:
+            return [{"error": str(e)}]

requirements.txt ADDED Viewed

	@@ -0,0 +1,3 @@

+onnxruntime>=1.16.0
+numpy>=1.24.0
+Pillow>=10.0.0