EdgeTAM / README.md

wli1995

Update README.md

c6e1533 verified 2 days ago

3.75 kB

	---
	license: afl-3.0
	base_model:
	- facebook/EdgeTAM
	pipeline_tag: image-segmentation
	---
	# EdgeTAM
	基于EdgeTAM的图像分割Pipeline，支持多种输入提示（框、点、掩码），支持650N系列平台的模型推理。

	支持芯片:
	- AX650N


	支持硬件

	- [M4N-Dock(爱芯派Pro)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html)
	- [M.2 Accelerator card](https://docs.m5stack.com/zh_CN/ai_hardware/LLM-8850_Card)

	原始模型请参考
	- [EdgeTAM Github](https://github.com/facebookresearch/EdgeTAM)
	- [EdgeTAM Huggingface](https://huggingface.co/facebook/EdgeTAM)

	## 性能对比

	- 输入图片大小 1024x1024

	\| Models \| Latency (ms) \| CMM Usage (MB) \|
	\| --------------------- \| ---------------------- \| -------------- \|
	\| edgetam_image_encoder \| 22.348 \| 29.124 \|
	\| edgetam_prompt_encoder \| 0.055 \| 0.023 \|
	\| edgetam_prompt_mask_encoder \| 0.457 \| 0.037 \|
	\| edgetam_mask_decoder \| 4.729 \| 16.730 \|

	## 模型转换
	- 模型转换工具链[Pulsar2](https://huggingface.co/AXERA-TECH/Pulsar2)
	- 转换文档[Model Convert](https://github.com/AXERA-TECH/EdgeTAM.Axera/tree/main/model_convert)

	## 环境准备
	- NPU Python API: [pyaxengine](https://github.com/AXERA-TECH/pyaxengine)

	安装需要的python库
	```pip install -r requirements.txt```

	## 运行

	```bash
	(myenv) root@ax650:~/EdgeTAM# python3 image_prediction_ax.py --input_box 75,275,1725,850
	[INFO] Available providers: ['AxEngineExecutionProvider']
	Loading EdgeTAM Onnx models...
	[INFO] Using provider: AxEngineExecutionProvider
	[INFO] Chip type: ChipType.MC50
	[INFO] VNPU type: VNPUType.DISABLED
	[INFO] Engine version: 2.12.0s
	[INFO] Model type: 2 (triple core)
	[INFO] Compiler version: 5.0-patch1-dirty a512c95e-dirty
	[INFO] Using provider: AxEngineExecutionProvider
	[INFO] Model type: 2 (triple core)
	[INFO] Compiler version: 5.0-patch1-dirty a512c95e-dirty
	[INFO] Using provider: AxEngineExecutionProvider
	[INFO] Model type: 2 (triple core)
	[INFO] Compiler version: 5.0-patch1-dirty a512c95e-dirty
	[INFO] Using provider: AxEngineExecutionProvider
	[INFO] Model type: 2 (triple core)
	[INFO] Compiler version: 5.0-patch1-dirty a512c95e-dirty
	Get prompts:
	input_box: [ 75 275 1725 850]
	input_point_coords: None
	input_point_labels: None
	Only box input provided
	Get dense_embeddings_no_mask
	[0.9777304]
	✅ Saved: ./results/mask_1.png
	```

	保存结果在 `./results` 目录下：
	![image](./results/mask_1.png)

	```
	(myenv) root@ax650:~/EdgeTAM# python3 image_prediction_ax.py --image_path ./examples/images/truck.jpg --input_box 425,600,700,875 --input_point_coords 575,750 --input_point_labels 0
	[INFO] Available providers: ['AxEngineExecutionProvider']
	Loading EdgeTAM Onnx models...
	[INFO] Using provider: AxEngineExecutionProvider
	[INFO] Chip type: ChipType.MC50
	[INFO] VNPU type: VNPUType.DISABLED
	[INFO] Engine version: 2.12.0s
	[INFO] Model type: 2 (triple core)
	[INFO] Compiler version: 5.0-patch1-dirty a512c95e-dirty
	[INFO] Using provider: AxEngineExecutionProvider
	[INFO] Model type: 2 (triple core)
	[INFO] Compiler version: 5.0-patch1-dirty a512c95e-dirty
	[INFO] Using provider: AxEngineExecutionProvider
	[INFO] Model type: 2 (triple core)
	[INFO] Compiler version: 5.0-patch1-dirty a512c95e-dirty
	[INFO] Using provider: AxEngineExecutionProvider
	[INFO] Model type: 2 (triple core)
	[INFO] Compiler version: 5.0-patch1-dirty a512c95e-dirty
	['575,750']
	575,750
	Get prompts:
	input_box: [425 600 700 875]
	input_point_coords: [[575 750]]
	input_point_labels: [0]
	Get dense_embeddings_no_mask
	[0.90291053]
	✅ Saved: ./results/mask_1.png

	```
	![image](./results/mask_5.png)