tonyyang2000
/

EdgeSAM

Model card Files Files and versions

EdgeSAM / README.md

JingShiang Yang

add handler.py

594e70f 3 months ago

|

history blame contribute delete

2.27 kB

	---
	license: mit
	---

	# EdgeSAM - Efficient Segment Anything Model

	EdgeSAM is an accelerated variant of the Segment Anything Model (SAM) optimized for edge devices using ONNX Runtime.

	## Model Files

	- `edge_sam_3x_encoder.onnx` - Image encoder (1024x1024 input)
	- `edge_sam_3x_decoder.onnx` - Mask decoder with prompt support

	## Usage

	### API Request Format

	```python
	import requests
	import base64

	# Encode your image
	with open("image.jpg", "rb") as f:
	image_b64 = base64.b64encode(f.read()).decode()

	# Make request
	response = requests.post(
	"https://YOUR-ENDPOINT-URL",
	json={
	"inputs": image_b64,
	"parameters": {
	"point_coords": [[512, 512]], # Click point in 1024x1024 space
	"point_labels": [1], # 1 = foreground, 0 = background
	"return_mask_image": True
	}
	}
	)

	result = response.json()
	```

	### Response Format

	```json
	[
	{
	"mask_shape": [1024, 1024],
	"has_object": true,
	"mask": "<base64_encoded_png>"
	}
	]
	```

	### Parameters

	- point_coords: Array of `[x, y]` coordinates in 1024x1024 space (optional)
	- point_labels: Array of labels (1=foreground, 0=background) corresponding to points (optional)
	- box_coords: Bounding box `[x1, y1, x2, y2]` (optional, not yet implemented)
	- return_mask_image: Return base64-encoded PNG mask (default: `true`)

	### Coordinate System

	All coordinates should be in 1024x1024 space, regardless of original image size. The handler automatically resizes input images to 1024x1024 before processing.

	Example: For a click at the center of any image, use `[512, 512]`.

	## Local Testing

	```bash
	# Install dependencies
	pip install -r requirements.txt

	# Run test script
	python test_handler.py
	```

	This will create:
	- `test_input.png` - Test image with red circle
	- `test_output_mask.png` - Generated segmentation mask
	- `test_output_overlay.png` - Overlay visualization

	## Technical Details

	- Input: RGB images (auto-resized to 1024x1024)
	- Preprocessing: Normalized to [0, 1] range (`/ 255.0`)
	- Hardware: Supports CUDA GPU with automatic CPU fallback
	- Framework: ONNX Runtime Web compatible