Upload 12 files
Browse files- .gitattributes +3 -0
- README.md +158 -3
- convert_rknn.py +98 -0
- dog.jpg +0 -0
- export_onnx.py +278 -0
- sam2.1_hiera_large_decoder.onnx +3 -0
- sam2.1_hiera_large_encoder.rknn +3 -0
- sam2.1_hiera_small_decoder.onnx +3 -0
- sam2.1_hiera_small_encoder.rknn +3 -0
- sam2.1_hiera_tiny_decoder.onnx +3 -0
- sam2.1_hiera_tiny_encoder.rknn +3 -0
- test_onnx.py +195 -0
- test_rknn.py +178 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
sam2.1_hiera_large_encoder.rknn filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
sam2.1_hiera_small_encoder.rknn filter=lfs diff=lfs merge=lfs -text
|
| 38 |
+
sam2.1_hiera_tiny_encoder.rknn filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
|
@@ -1,3 +1,158 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Segment Anything 2.1 RKNN2
|
| 2 |
+
|
| 3 |
+
## (English README see below)
|
| 4 |
+
|
| 5 |
+
在RK3588上运行强大的Segment Anything 2.1图像分割模型!
|
| 6 |
+
|
| 7 |
+
- 推理速度(RK3588):
|
| 8 |
+
- Encoder(Tiny)(单NPU核): 3s
|
| 9 |
+
- Encoder(Small)(单NPU核): 3.5s
|
| 10 |
+
- Encoder(Large)(单NPU核): 12s
|
| 11 |
+
- Decoder(CPU): 0.1s
|
| 12 |
+
|
| 13 |
+
- 内存占用(RK3588):
|
| 14 |
+
- Encoder(Tiny): 0.95GB
|
| 15 |
+
- Encoder(Small): 1.1GB
|
| 16 |
+
- Encoder(Large): 4.1GB
|
| 17 |
+
- Decoder: 非常小, 可以忽略不计
|
| 18 |
+
|
| 19 |
+
## 使用方法
|
| 20 |
+
|
| 21 |
+
1. 克隆或者下载此仓库到本地. 模型较大, 请确保有足够的磁盘空间.
|
| 22 |
+
|
| 23 |
+
2. 安装依赖
|
| 24 |
+
|
| 25 |
+
```bash
|
| 26 |
+
pip install numpy<2 pillow matplotlib opencv-python onnxruntime rknn-toolkit-lite2
|
| 27 |
+
```
|
| 28 |
+
|
| 29 |
+
3. 运行
|
| 30 |
+
|
| 31 |
+
```bash
|
| 32 |
+
python test_rknn.py
|
| 33 |
+
```
|
| 34 |
+
|
| 35 |
+
你可以修改`test_rknn.py`中这一部分
|
| 36 |
+
```python
|
| 37 |
+
def main():
|
| 38 |
+
# 1. 加载原始图片
|
| 39 |
+
path = "dog.jpg"
|
| 40 |
+
orig_image, input_image, (scale, offset_x, offset_y) = load_image(path)
|
| 41 |
+
decoder_path = "sam2.1_hiera_small_decoder.onnx"
|
| 42 |
+
encoder_path = "sam2.1_hiera_small_encoder.rknn"
|
| 43 |
+
...
|
| 44 |
+
```
|
| 45 |
+
|
| 46 |
+
来测试不同的模型和图片. 注意, 和SAM1不同, 这里的encoder和decoder必须使用同一个版本的模型.
|
| 47 |
+
|
| 48 |
+
|
| 49 |
+
## 模型转换
|
| 50 |
+
|
| 51 |
+
1. 安装依赖
|
| 52 |
+
|
| 53 |
+
```bash
|
| 54 |
+
pip install numpy<2 onnxslim onnxruntime rknn-toolkit2 sam2
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
+
2. 下载SAM2.1的pt模型文件. 可以从[这里](https://github.com/facebookresearch/sam2?tab=readme-ov-file#model-description)下载.
|
| 58 |
+
|
| 59 |
+
3. 转换pt模型到onnx模型. 以Tiny模型为例:
|
| 60 |
+
|
| 61 |
+
```bash
|
| 62 |
+
python ./export_onnx.py --model_type sam2.1_hiera_tiny --checkpoint ./sam2.1_hiera_tiny.pt --output_encoder ./sam2.1_hiera_tiny_encoder.onnx --output_decoder sam2.1_hiera_tiny_decoder.onnx
|
| 63 |
+
```
|
| 64 |
+
|
| 65 |
+
4. 将onnx模型转换为rknn模型. 以Tiny模型为例:
|
| 66 |
+
|
| 67 |
+
```bash
|
| 68 |
+
python ./convert_rknn.py sam2.1_hiera_tiny
|
| 69 |
+
```
|
| 70 |
+
如果在常量折叠时报错, 请尝试更新onnxruntime到最新版本.
|
| 71 |
+
|
| 72 |
+
## 已知问题
|
| 73 |
+
|
| 74 |
+
- 只实现了图片分割, 没有实现视频分割.
|
| 75 |
+
- 由于RKNN-Toolkit2的问题, decoder模型在转换时会报错, 暂时需要使用CPU onnxruntime运行, 会略微增加CPU占用.
|
| 76 |
+
|
| 77 |
+
## 参考
|
| 78 |
+
|
| 79 |
+
- [samexporter/export_sam21_cvat.py](https://github.com/hashJoe/samexporter/blob/cvat/samexporter/export_sam21_cvat.py)
|
| 80 |
+
- [SAM 2](https://github.com/facebookresearch/sam2)
|
| 81 |
+
|
| 82 |
+
## English README
|
| 83 |
+
|
| 84 |
+
Run the powerful Segment Anything 2.1 image segmentation model on RK3588!
|
| 85 |
+
|
| 86 |
+
- Inference Speed (RK3588):
|
| 87 |
+
- Encoder(Tiny)(Single NPU Core): 3s
|
| 88 |
+
- Encoder(Small)(Single NPU Core): 3.5s
|
| 89 |
+
- Encoder(Large)(Single NPU Core): 12s
|
| 90 |
+
- Decoder(CPU): 0.1s
|
| 91 |
+
|
| 92 |
+
- Memory Usage (RK3588):
|
| 93 |
+
- Encoder(Tiny): 0.95GB
|
| 94 |
+
- Encoder(Small): 1.1GB
|
| 95 |
+
- Encoder(Large): 4.1GB
|
| 96 |
+
- Decoder: Negligible
|
| 97 |
+
|
| 98 |
+
## Usage
|
| 99 |
+
|
| 100 |
+
1. Clone or download this repository. Models are large, please ensure sufficient disk space.
|
| 101 |
+
|
| 102 |
+
2. Install dependencies
|
| 103 |
+
|
| 104 |
+
```bash
|
| 105 |
+
pip install numpy<2 pillow matplotlib opencv-python onnxruntime rknn-toolkit-lite2
|
| 106 |
+
```
|
| 107 |
+
|
| 108 |
+
3. Run
|
| 109 |
+
|
| 110 |
+
```bash
|
| 111 |
+
python test_rknn.py
|
| 112 |
+
```
|
| 113 |
+
|
| 114 |
+
You can modify this part in `test_rknn.py`
|
| 115 |
+
```python
|
| 116 |
+
def main():
|
| 117 |
+
# 1. Load original image
|
| 118 |
+
path = "dog.jpg"
|
| 119 |
+
orig_image, input_image, (scale, offset_x, offset_y) = load_image(path)
|
| 120 |
+
decoder_path = "sam2.1_hiera_small_decoder.onnx"
|
| 121 |
+
encoder_path = "sam2.1_hiera_small_encoder.rknn"
|
| 122 |
+
...
|
| 123 |
+
```
|
| 124 |
+
|
| 125 |
+
to test different models and images. Note that unlike SAM1, the encoder and decoder must use the same version of the model.
|
| 126 |
+
|
| 127 |
+
## Model Conversion
|
| 128 |
+
|
| 129 |
+
1. Install dependencies
|
| 130 |
+
|
| 131 |
+
```bash
|
| 132 |
+
pip install numpy<2 onnxslim onnxruntime rknn-toolkit2 sam2
|
| 133 |
+
```
|
| 134 |
+
|
| 135 |
+
2. Download SAM2.1 pt model files. You can download them from [here](https://github.com/facebookresearch/sam2?tab=readme-ov-file#model-description).
|
| 136 |
+
|
| 137 |
+
3. Convert pt models to onnx models. Taking Tiny model as an example:
|
| 138 |
+
|
| 139 |
+
```bash
|
| 140 |
+
python ./export_onnx.py --model_type sam2.1_hiera_tiny --checkpoint ./sam2.1_hiera_tiny.pt --output_encoder ./sam2.1_hiera_tiny_encoder.onnx --output_decoder sam2.1_hiera_tiny_decoder.onnx
|
| 141 |
+
```
|
| 142 |
+
|
| 143 |
+
4. Convert onnx models to rknn models. Taking Tiny model as an example:
|
| 144 |
+
|
| 145 |
+
```bash
|
| 146 |
+
python ./convert_rknn.py sam2.1_hiera_tiny
|
| 147 |
+
```
|
| 148 |
+
If you encounter errors during constant folding, try updating onnxruntime to the latest version.
|
| 149 |
+
|
| 150 |
+
## Known Issues
|
| 151 |
+
|
| 152 |
+
- Only image segmentation is implemented, video segmentation is not supported.
|
| 153 |
+
- Due to issues with RKNN-Toolkit2, the decoder model conversion will fail. Currently, it needs to run on CPU using onnxruntime, which will slightly increase CPU usage.
|
| 154 |
+
|
| 155 |
+
## References
|
| 156 |
+
|
| 157 |
+
- [samexporter/export_sam21_cvat.py](https://github.com/hashJoe/samexporter/blob/cvat/samexporter/export_sam21_cvat.py)
|
| 158 |
+
- [SAM 2](https://github.com/facebookresearch/sam2)
|
convert_rknn.py
ADDED
|
@@ -0,0 +1,98 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python
|
| 2 |
+
# coding: utf-8
|
| 3 |
+
|
| 4 |
+
import datetime
|
| 5 |
+
import argparse
|
| 6 |
+
from rknn.api import RKNN
|
| 7 |
+
from sys import exit
|
| 8 |
+
import os
|
| 9 |
+
import onnxslim
|
| 10 |
+
|
| 11 |
+
num_pointss = [1]
|
| 12 |
+
num_labelss = [1]
|
| 13 |
+
|
| 14 |
+
def convert_to_rknn(onnx_model, model_part, dataset="/home/zt/rk3588-nn/rknn_model_zoo/datasets/COCO/coco_subset_20.txt", quantize=False):
|
| 15 |
+
"""转换单个ONNX模型到RKNN格式"""
|
| 16 |
+
rknn_model = onnx_model.replace(".onnx",".rknn")
|
| 17 |
+
timedate_iso = datetime.datetime.now().isoformat()
|
| 18 |
+
|
| 19 |
+
print(f"\n开始转换 {onnx_model} 到 {rknn_model}")
|
| 20 |
+
|
| 21 |
+
input_shapes = None
|
| 22 |
+
|
| 23 |
+
if model_part == "encoder":
|
| 24 |
+
input_shapes = None
|
| 25 |
+
elif model_part == "decoder":
|
| 26 |
+
input_shapes = [
|
| 27 |
+
[
|
| 28 |
+
[1, 256, 64, 64], # image_embedding
|
| 29 |
+
[1, 32, 256, 256], # high_res_feats_0
|
| 30 |
+
[1, 64, 128, 128], # high_res_feats_1
|
| 31 |
+
[num_labels, num_points, 2], # point_coords
|
| 32 |
+
[num_labels, num_points], # point_labels
|
| 33 |
+
[num_labels, 1, 256, 256], # mask_input
|
| 34 |
+
[num_labels], # has_mask_input
|
| 35 |
+
]
|
| 36 |
+
for num_labels in num_labelss
|
| 37 |
+
for num_points in num_pointss
|
| 38 |
+
]
|
| 39 |
+
|
| 40 |
+
rknn = RKNN(verbose=True)
|
| 41 |
+
rknn.config(
|
| 42 |
+
dynamic_input=input_shapes,
|
| 43 |
+
std_values=[[255,255,255]] if model_part == "encoder" else None,
|
| 44 |
+
quantized_dtype='w8a8',
|
| 45 |
+
quantized_algorithm='normal',
|
| 46 |
+
quantized_method='channel',
|
| 47 |
+
quantized_hybrid_level=0,
|
| 48 |
+
target_platform='rk3588',
|
| 49 |
+
quant_img_RGB2BGR = False,
|
| 50 |
+
float_dtype='float16',
|
| 51 |
+
optimization_level=3,
|
| 52 |
+
custom_string=f"converted at {timedate_iso}",
|
| 53 |
+
remove_weight=False,
|
| 54 |
+
compress_weight=False,
|
| 55 |
+
inputs_yuv_fmt=None,
|
| 56 |
+
single_core_mode=False,
|
| 57 |
+
model_pruning=False,
|
| 58 |
+
op_target=None,
|
| 59 |
+
quantize_weight=False,
|
| 60 |
+
remove_reshape=False,
|
| 61 |
+
sparse_infer=False,
|
| 62 |
+
enable_flash_attention=False,
|
| 63 |
+
)
|
| 64 |
+
|
| 65 |
+
ret = rknn.load_onnx(model=onnx_model)
|
| 66 |
+
ret = rknn.build(do_quantization=quantize, dataset=dataset, rknn_batch_size=None)
|
| 67 |
+
ret = rknn.export_rknn(rknn_model)
|
| 68 |
+
print(f"完成转换 {rknn_model}\n")
|
| 69 |
+
|
| 70 |
+
def main():
|
| 71 |
+
parser = argparse.ArgumentParser(description='转换SAM模型从ONNX到RKNN格式')
|
| 72 |
+
parser.add_argument('model_name', type=str, help='模型名称,例如: sam2.1_hiera_tiny')
|
| 73 |
+
args = parser.parse_args()
|
| 74 |
+
|
| 75 |
+
# 构建encoder和decoder的文件名
|
| 76 |
+
encoder_onnx = f"{args.model_name}_encoder.onnx"
|
| 77 |
+
decoder_onnx = f"{args.model_name}_decoder.onnx"
|
| 78 |
+
|
| 79 |
+
# 检查文件是否存在
|
| 80 |
+
for model in [encoder_onnx, decoder_onnx]:
|
| 81 |
+
if not os.path.exists(model):
|
| 82 |
+
print(f"错误: 找不到文件 {model}")
|
| 83 |
+
exit(1)
|
| 84 |
+
|
| 85 |
+
# 转换encoder和decoder
|
| 86 |
+
#encoder需要先跑一个onnxslim
|
| 87 |
+
print("开始转换encoder...")
|
| 88 |
+
onnxslim.slim(encoder_onnx, output_model="encoder_slim.onnx", skip_fusion_patterns=["EliminationSlice"])
|
| 89 |
+
convert_to_rknn("encoder_slim.onnx", model_part="encoder")
|
| 90 |
+
os.rename("encoder_slim.rknn", encoder_onnx.replace(".onnx", ".rknn"))
|
| 91 |
+
os.remove("encoder_slim.onnx")
|
| 92 |
+
|
| 93 |
+
# convert_to_rknn(decoder_onnx, model_part="decoder") # 坏的
|
| 94 |
+
|
| 95 |
+
print("所有模型转换完成!")
|
| 96 |
+
|
| 97 |
+
if __name__ == "__main__":
|
| 98 |
+
main()
|
dog.jpg
ADDED
|
export_onnx.py
ADDED
|
@@ -0,0 +1,278 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from typing import Any
|
| 2 |
+
import argparse
|
| 3 |
+
import pathlib
|
| 4 |
+
|
| 5 |
+
import torch
|
| 6 |
+
from torch import nn
|
| 7 |
+
from sam2.build_sam import build_sam2
|
| 8 |
+
from sam2.modeling.sam2_base import SAM2Base
|
| 9 |
+
|
| 10 |
+
|
| 11 |
+
class SAM2ImageEncoder(nn.Module):
|
| 12 |
+
def __init__(self, sam_model: SAM2Base) -> None:
|
| 13 |
+
super().__init__()
|
| 14 |
+
self.model = sam_model
|
| 15 |
+
self.image_encoder = sam_model.image_encoder
|
| 16 |
+
self.no_mem_embed = sam_model.no_mem_embed
|
| 17 |
+
|
| 18 |
+
def forward(self, x: torch.Tensor) -> tuple[Any, Any, Any]:
|
| 19 |
+
backbone_out = self.image_encoder(x)
|
| 20 |
+
backbone_out["backbone_fpn"][0] = self.model.sam_mask_decoder.conv_s0(
|
| 21 |
+
backbone_out["backbone_fpn"][0]
|
| 22 |
+
)
|
| 23 |
+
backbone_out["backbone_fpn"][1] = self.model.sam_mask_decoder.conv_s1(
|
| 24 |
+
backbone_out["backbone_fpn"][1]
|
| 25 |
+
)
|
| 26 |
+
|
| 27 |
+
feature_maps = backbone_out["backbone_fpn"][
|
| 28 |
+
-self.model.num_feature_levels :
|
| 29 |
+
]
|
| 30 |
+
vision_pos_embeds = backbone_out["vision_pos_enc"][
|
| 31 |
+
-self.model.num_feature_levels :
|
| 32 |
+
]
|
| 33 |
+
|
| 34 |
+
feat_sizes = [(x.shape[-2], x.shape[-1]) for x in vision_pos_embeds]
|
| 35 |
+
|
| 36 |
+
# flatten NxCxHxW to HWxNxC
|
| 37 |
+
vision_feats = [x.flatten(2).permute(2, 0, 1) for x in feature_maps]
|
| 38 |
+
vision_feats[-1] = vision_feats[-1] + self.no_mem_embed
|
| 39 |
+
|
| 40 |
+
feats = [
|
| 41 |
+
feat.permute(1, 2, 0).reshape(1, -1, *feat_size)
|
| 42 |
+
for feat, feat_size in zip(vision_feats[::-1], feat_sizes[::-1])
|
| 43 |
+
][::-1]
|
| 44 |
+
|
| 45 |
+
return feats[0], feats[1], feats[2]
|
| 46 |
+
|
| 47 |
+
|
| 48 |
+
class SAM2ImageDecoder(nn.Module):
|
| 49 |
+
def __init__(self, sam_model: SAM2Base, multimask_output: bool) -> None:
|
| 50 |
+
super().__init__()
|
| 51 |
+
self.mask_decoder = sam_model.sam_mask_decoder
|
| 52 |
+
self.prompt_encoder = sam_model.sam_prompt_encoder
|
| 53 |
+
self.model = sam_model
|
| 54 |
+
self.img_size = sam_model.image_size
|
| 55 |
+
self.multimask_output = multimask_output
|
| 56 |
+
|
| 57 |
+
@torch.no_grad()
|
| 58 |
+
def forward(
|
| 59 |
+
self,
|
| 60 |
+
image_embed: torch.Tensor,
|
| 61 |
+
high_res_feats_0: torch.Tensor,
|
| 62 |
+
high_res_feats_1: torch.Tensor,
|
| 63 |
+
point_coords: torch.Tensor,
|
| 64 |
+
point_labels: torch.Tensor,
|
| 65 |
+
orig_im_size: torch.Tensor,
|
| 66 |
+
mask_input: torch.Tensor,
|
| 67 |
+
has_mask_input: torch.Tensor,
|
| 68 |
+
):
|
| 69 |
+
sparse_embedding = self._embed_points(point_coords, point_labels)
|
| 70 |
+
self.sparse_embedding = sparse_embedding
|
| 71 |
+
dense_embedding = self._embed_masks(mask_input, has_mask_input)
|
| 72 |
+
|
| 73 |
+
high_res_feats = [high_res_feats_0, high_res_feats_1]
|
| 74 |
+
image_embed = image_embed
|
| 75 |
+
|
| 76 |
+
masks, iou_predictions, _, _ = self.mask_decoder.predict_masks(
|
| 77 |
+
image_embeddings=image_embed,
|
| 78 |
+
image_pe=self.prompt_encoder.get_dense_pe(),
|
| 79 |
+
sparse_prompt_embeddings=sparse_embedding,
|
| 80 |
+
dense_prompt_embeddings=dense_embedding,
|
| 81 |
+
repeat_image=False,
|
| 82 |
+
high_res_features=high_res_feats,
|
| 83 |
+
)
|
| 84 |
+
|
| 85 |
+
if self.multimask_output:
|
| 86 |
+
masks = masks[:, 1:, :, :]
|
| 87 |
+
iou_predictions = iou_predictions[:, 1:]
|
| 88 |
+
else:
|
| 89 |
+
masks, iou_predictions = (
|
| 90 |
+
self.mask_decoder._dynamic_multimask_via_stability(
|
| 91 |
+
masks, iou_predictions
|
| 92 |
+
)
|
| 93 |
+
)
|
| 94 |
+
|
| 95 |
+
masks = torch.clamp(masks, -32.0, 32.0)
|
| 96 |
+
|
| 97 |
+
return masks, iou_predictions
|
| 98 |
+
|
| 99 |
+
def _embed_points(
|
| 100 |
+
self, point_coords: torch.Tensor, point_labels: torch.Tensor
|
| 101 |
+
) -> torch.Tensor:
|
| 102 |
+
|
| 103 |
+
point_coords = point_coords + 0.5
|
| 104 |
+
|
| 105 |
+
padding_point = torch.zeros(
|
| 106 |
+
(point_coords.shape[0], 1, 2), device=point_coords.device
|
| 107 |
+
)
|
| 108 |
+
padding_label = -torch.ones(
|
| 109 |
+
(point_labels.shape[0], 1), device=point_labels.device
|
| 110 |
+
)
|
| 111 |
+
point_coords = torch.cat([point_coords, padding_point], dim=1)
|
| 112 |
+
point_labels = torch.cat([point_labels, padding_label], dim=1)
|
| 113 |
+
|
| 114 |
+
point_coords[:, :, 0] = point_coords[:, :, 0] / self.model.image_size
|
| 115 |
+
point_coords[:, :, 1] = point_coords[:, :, 1] / self.model.image_size
|
| 116 |
+
|
| 117 |
+
point_embedding = self.prompt_encoder.pe_layer._pe_encoding(
|
| 118 |
+
point_coords
|
| 119 |
+
)
|
| 120 |
+
point_labels = point_labels.unsqueeze(-1).expand_as(point_embedding)
|
| 121 |
+
|
| 122 |
+
point_embedding = point_embedding * (point_labels != -1)
|
| 123 |
+
point_embedding = (
|
| 124 |
+
point_embedding
|
| 125 |
+
+ self.prompt_encoder.not_a_point_embed.weight
|
| 126 |
+
* (point_labels == -1)
|
| 127 |
+
)
|
| 128 |
+
|
| 129 |
+
for i in range(self.prompt_encoder.num_point_embeddings):
|
| 130 |
+
point_embedding = (
|
| 131 |
+
point_embedding
|
| 132 |
+
+ self.prompt_encoder.point_embeddings[i].weight
|
| 133 |
+
* (point_labels == i)
|
| 134 |
+
)
|
| 135 |
+
|
| 136 |
+
return point_embedding
|
| 137 |
+
|
| 138 |
+
def _embed_masks(
|
| 139 |
+
self, input_mask: torch.Tensor, has_mask_input: torch.Tensor
|
| 140 |
+
) -> torch.Tensor:
|
| 141 |
+
mask_embedding = has_mask_input * self.prompt_encoder.mask_downscaling(
|
| 142 |
+
input_mask
|
| 143 |
+
)
|
| 144 |
+
mask_embedding = mask_embedding + (
|
| 145 |
+
1 - has_mask_input
|
| 146 |
+
) * self.prompt_encoder.no_mask_embed.weight.reshape(1, -1, 1, 1)
|
| 147 |
+
return mask_embedding
|
| 148 |
+
|
| 149 |
+
|
| 150 |
+
if __name__ == "__main__":
|
| 151 |
+
parser = argparse.ArgumentParser(
|
| 152 |
+
description="Export the SAM2 prompt encoder and mask decoder to an ONNX model."
|
| 153 |
+
)
|
| 154 |
+
parser.add_argument(
|
| 155 |
+
"--checkpoint",
|
| 156 |
+
type=str,
|
| 157 |
+
required=True,
|
| 158 |
+
help="The path to the SAM model checkpoint.",
|
| 159 |
+
)
|
| 160 |
+
|
| 161 |
+
parser.add_argument(
|
| 162 |
+
"--output_encoder",
|
| 163 |
+
type=str,
|
| 164 |
+
required=True,
|
| 165 |
+
help="The filename to save the encoder ONNX model to.",
|
| 166 |
+
)
|
| 167 |
+
|
| 168 |
+
parser.add_argument(
|
| 169 |
+
"--output_decoder",
|
| 170 |
+
type=str,
|
| 171 |
+
required=True,
|
| 172 |
+
help="The filename to save the decoder ONNX model to.",
|
| 173 |
+
)
|
| 174 |
+
|
| 175 |
+
parser.add_argument(
|
| 176 |
+
"--model_type",
|
| 177 |
+
type=str,
|
| 178 |
+
required=True,
|
| 179 |
+
help="In the form of sam2_hiera_{tiny, small, base_plus, large}.",
|
| 180 |
+
)
|
| 181 |
+
|
| 182 |
+
parser.add_argument(
|
| 183 |
+
"--opset",
|
| 184 |
+
type=int,
|
| 185 |
+
default=17,
|
| 186 |
+
help="The ONNX opset version to use. Must be >=11",
|
| 187 |
+
)
|
| 188 |
+
|
| 189 |
+
args = parser.parse_args()
|
| 190 |
+
|
| 191 |
+
input_size = (1024, 1024)
|
| 192 |
+
multimask_output = False
|
| 193 |
+
model_type = args.model_type
|
| 194 |
+
if model_type == "sam2.1_hiera_tiny":
|
| 195 |
+
model_cfg = "configs/sam2.1/sam2.1_hiera_t.yaml"
|
| 196 |
+
elif model_type == "sam2.1_hiera_small":
|
| 197 |
+
model_cfg = "configs/sam2.1/sam2.1_hiera_s.yaml"
|
| 198 |
+
elif model_type == "sam2.1_hiera_base_plus":
|
| 199 |
+
model_cfg = "configs/sam2.1/sam2.1_hiera_b+.yaml"
|
| 200 |
+
elif model_type == "sam2.1_hiera_large":
|
| 201 |
+
model_cfg = "configs/sam2.1/sam2.1_hiera_l.yaml"
|
| 202 |
+
else:
|
| 203 |
+
model_cfg = "configs/sam2.1/sam2.1_hiera_l.yaml"
|
| 204 |
+
|
| 205 |
+
sam2_model = build_sam2(model_cfg, args.checkpoint, device="cpu")
|
| 206 |
+
img = torch.randn(1, 3, input_size[0], input_size[1]).cpu()
|
| 207 |
+
sam2_encoder = SAM2ImageEncoder(sam2_model).cpu()
|
| 208 |
+
high_res_feats_0, high_res_feats_1, image_embed = sam2_encoder(img)
|
| 209 |
+
|
| 210 |
+
pathlib.Path(args.output_encoder).parent.mkdir(parents=True, exist_ok=True)
|
| 211 |
+
torch.onnx.export(
|
| 212 |
+
sam2_encoder,
|
| 213 |
+
img,
|
| 214 |
+
args.output_encoder,
|
| 215 |
+
export_params=True,
|
| 216 |
+
opset_version=args.opset,
|
| 217 |
+
do_constant_folding=True,
|
| 218 |
+
input_names=["image"],
|
| 219 |
+
output_names=["high_res_feats_0", "high_res_feats_1", "image_embed"],
|
| 220 |
+
)
|
| 221 |
+
print("Saved encoder to", args.output_encoder)
|
| 222 |
+
|
| 223 |
+
sam2_decoder = SAM2ImageDecoder(
|
| 224 |
+
sam2_model, multimask_output=multimask_output
|
| 225 |
+
).cpu()
|
| 226 |
+
|
| 227 |
+
embed_dim = sam2_model.sam_prompt_encoder.embed_dim
|
| 228 |
+
embed_size = (
|
| 229 |
+
sam2_model.image_size // sam2_model.backbone_stride,
|
| 230 |
+
sam2_model.image_size // sam2_model.backbone_stride,
|
| 231 |
+
)
|
| 232 |
+
mask_input_size = [4 * x for x in embed_size]
|
| 233 |
+
print(embed_dim, embed_size, mask_input_size)
|
| 234 |
+
|
| 235 |
+
point_coords = torch.randint(
|
| 236 |
+
low=0, high=input_size[1], size=(1, 5, 2), dtype=torch.float
|
| 237 |
+
)
|
| 238 |
+
point_labels = torch.randint(low=0, high=1, size=(1, 5), dtype=torch.float)
|
| 239 |
+
mask_input = torch.randn(1, 1, *mask_input_size, dtype=torch.float)
|
| 240 |
+
has_mask_input = torch.tensor([1], dtype=torch.float)
|
| 241 |
+
orig_im_size = torch.tensor([input_size[0], input_size[1]], dtype=torch.int)
|
| 242 |
+
|
| 243 |
+
pathlib.Path(args.output_decoder).parent.mkdir(parents=True, exist_ok=True)
|
| 244 |
+
torch.onnx.export(
|
| 245 |
+
sam2_decoder,
|
| 246 |
+
(
|
| 247 |
+
image_embed,
|
| 248 |
+
high_res_feats_0,
|
| 249 |
+
high_res_feats_1,
|
| 250 |
+
point_coords,
|
| 251 |
+
point_labels,
|
| 252 |
+
orig_im_size,
|
| 253 |
+
mask_input,
|
| 254 |
+
has_mask_input,
|
| 255 |
+
),
|
| 256 |
+
args.output_decoder,
|
| 257 |
+
export_params=True,
|
| 258 |
+
opset_version=args.opset,
|
| 259 |
+
do_constant_folding=True,
|
| 260 |
+
input_names=[
|
| 261 |
+
"image_embed",
|
| 262 |
+
"high_res_feats_0",
|
| 263 |
+
"high_res_feats_1",
|
| 264 |
+
"point_coords",
|
| 265 |
+
"point_labels",
|
| 266 |
+
"orig_im_size",
|
| 267 |
+
"mask_input",
|
| 268 |
+
"has_mask_input",
|
| 269 |
+
],
|
| 270 |
+
output_names=["masks", "iou_predictions"],
|
| 271 |
+
dynamic_axes={
|
| 272 |
+
"point_coords": {0: "num_labels", 1: "num_points"},
|
| 273 |
+
"point_labels": {0: "num_labels", 1: "num_points"},
|
| 274 |
+
"mask_input": {0: "num_labels"},
|
| 275 |
+
"has_mask_input": {0: "num_labels"},
|
| 276 |
+
},
|
| 277 |
+
)
|
| 278 |
+
print("Saved decoder to", args.output_decoder)
|
sam2.1_hiera_large_decoder.onnx
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c039b2455b4e92dfeb8cb8e4d10a98a92a79ec1550a7119c997bad4352811554
|
| 3 |
+
size 16526061
|
sam2.1_hiera_large_encoder.rknn
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:0ce5ae036eb273f4e017481c8cb744e50c84a93e81e2f6a84ff4b89a118e756a
|
| 3 |
+
size 1419024037
|
sam2.1_hiera_small_decoder.onnx
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:4e7ba7a80bfae89c1a660d3b64291fa4f5a2de15022a4e8eab933218d4f34582
|
| 3 |
+
size 16526003
|
sam2.1_hiera_small_encoder.rknn
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d8b9efce9e5d12900a508dc1b79dfbd389057136a6d2ab4cb66654961f3106ef
|
| 3 |
+
size 374531749
|
sam2.1_hiera_tiny_decoder.onnx
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f594db10b3c7b4d9de7f8854693ea6f7a880e5e228ad08d7823393233e65f4fa
|
| 3 |
+
size 16525993
|
sam2.1_hiera_tiny_encoder.rknn
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c3750eef90b87ab63cfefbf4f89858072a4891818c315d96dddeea172119cba1
|
| 3 |
+
size 339018597
|
test_onnx.py
ADDED
|
@@ -0,0 +1,195 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import os
|
| 2 |
+
os.chdir(os.path.dirname(os.path.abspath(__file__)))
|
| 3 |
+
|
| 4 |
+
import numpy as np
|
| 5 |
+
import torch
|
| 6 |
+
import onnxruntime
|
| 7 |
+
from PIL import Image
|
| 8 |
+
import requests
|
| 9 |
+
from io import BytesIO
|
| 10 |
+
import matplotlib.pyplot as plt
|
| 11 |
+
from sam2.build_sam import build_sam2
|
| 12 |
+
from sam2.sam2_image_predictor import SAM2ImagePredictor
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
def load_image(url):
|
| 16 |
+
"""加载并预处理图片"""
|
| 17 |
+
response = requests.get(url)
|
| 18 |
+
image = Image.open(BytesIO(response.content)).convert("RGB")
|
| 19 |
+
print(f"Original image size: {image.size}")
|
| 20 |
+
|
| 21 |
+
# 计算resize后的尺寸,保持长宽比
|
| 22 |
+
target_size = (1024, 1024)
|
| 23 |
+
w, h = image.size
|
| 24 |
+
scale = min(target_size[0] / w, target_size[1] / h)
|
| 25 |
+
new_w = int(w * scale)
|
| 26 |
+
new_h = int(h * scale)
|
| 27 |
+
print(f"Scale factor: {scale}")
|
| 28 |
+
print(f"Resized dimensions: {new_w}x{new_h}")
|
| 29 |
+
|
| 30 |
+
# resize图片
|
| 31 |
+
resized_image = image.resize((new_w, new_h), Image.Resampling.LANCZOS)
|
| 32 |
+
|
| 33 |
+
# 创建1024x1024的黑色背景
|
| 34 |
+
processed_image = Image.new("RGB", target_size, (0, 0, 0))
|
| 35 |
+
# 将resized图片粘贴到中心位置
|
| 36 |
+
paste_x = (target_size[0] - new_w) // 2
|
| 37 |
+
paste_y = (target_size[1] - new_h) // 2
|
| 38 |
+
print(f"Paste position: ({paste_x}, {paste_y})")
|
| 39 |
+
processed_image.paste(resized_image, (paste_x, paste_y))
|
| 40 |
+
|
| 41 |
+
# 保存处理后的图片用于检查
|
| 42 |
+
processed_image.save("debug_processed_image.png")
|
| 43 |
+
|
| 44 |
+
# 转换为numpy数组并归一化到[0,1]
|
| 45 |
+
img_np = np.array(processed_image).astype(np.float32) / 255.0
|
| 46 |
+
# 调整维度顺序从HWC到CHW
|
| 47 |
+
img_np = img_np.transpose(2, 0, 1)
|
| 48 |
+
# 添加batch维度
|
| 49 |
+
img_np = np.expand_dims(img_np, axis=0)
|
| 50 |
+
|
| 51 |
+
print(f"Final input tensor shape: {img_np.shape}")
|
| 52 |
+
|
| 53 |
+
return image, img_np, (scale, paste_x, paste_y)
|
| 54 |
+
|
| 55 |
+
def prepare_point_input(point_coords, point_labels, image_size=(1024, 1024)):
|
| 56 |
+
"""准备点击输入数据"""
|
| 57 |
+
point_coords = np.array(point_coords, dtype=np.float32)
|
| 58 |
+
point_labels = np.array(point_labels, dtype=np.float32)
|
| 59 |
+
|
| 60 |
+
# 添加batch维度
|
| 61 |
+
point_coords = np.expand_dims(point_coords, axis=0)
|
| 62 |
+
point_labels = np.expand_dims(point_labels, axis=0)
|
| 63 |
+
|
| 64 |
+
# 准备mask输入
|
| 65 |
+
mask_input = np.zeros((1, 1, 256, 256), dtype=np.float32)
|
| 66 |
+
has_mask_input = np.zeros(1, dtype=np.float32)
|
| 67 |
+
orig_im_size = np.array(image_size, dtype=np.int32)
|
| 68 |
+
|
| 69 |
+
return point_coords, point_labels, mask_input, has_mask_input, orig_im_size
|
| 70 |
+
|
| 71 |
+
def main():
|
| 72 |
+
# 1. 加载原始图片
|
| 73 |
+
url = "https://raw.githubusercontent.com/facebookresearch/segment-anything/main/notebooks/images/dog.jpg"
|
| 74 |
+
orig_image, input_image, (scale, offset_x, offset_y) = load_image(url)
|
| 75 |
+
|
| 76 |
+
# 2. 准备输入点 - 需要根据scale和offset调整点击坐标
|
| 77 |
+
input_point_orig = [[750, 400]]
|
| 78 |
+
input_point = [[
|
| 79 |
+
int(x * scale + offset_x),
|
| 80 |
+
int(y * scale + offset_y)
|
| 81 |
+
] for x, y in input_point_orig]
|
| 82 |
+
print(f"Original point: {input_point_orig}")
|
| 83 |
+
print(f"Transformed point: {input_point}")
|
| 84 |
+
input_label = [1]
|
| 85 |
+
|
| 86 |
+
# 3. 运行PyTorch模型
|
| 87 |
+
print("Running PyTorch model...")
|
| 88 |
+
checkpoint = "sam2.1_hiera_large.pt"
|
| 89 |
+
model_cfg = "configs/sam2.1/sam2.1_hiera_l.yaml"
|
| 90 |
+
predictor = SAM2ImagePredictor(build_sam2(model_cfg, checkpoint))
|
| 91 |
+
|
| 92 |
+
with torch.inference_mode():
|
| 93 |
+
predictor.set_image(orig_image)
|
| 94 |
+
masks_pt, iou_scores_pt, low_res_masks_pt = predictor.predict(
|
| 95 |
+
point_coords=np.array(input_point),
|
| 96 |
+
point_labels=np.array(input_label),
|
| 97 |
+
multimask_output=True
|
| 98 |
+
)
|
| 99 |
+
|
| 100 |
+
# 4. 运行ONNX模型
|
| 101 |
+
print("Running ONNX model...")
|
| 102 |
+
encoder_path = "sam2.1_hiera_tiny_encoder.s.onnx"
|
| 103 |
+
decoder_path = "sam2.1_hiera_tiny_decoder.onnx"
|
| 104 |
+
|
| 105 |
+
# 创建ONNX Runtime会话
|
| 106 |
+
encoder_session = onnxruntime.InferenceSession(encoder_path)
|
| 107 |
+
decoder_session = onnxruntime.InferenceSession(decoder_path)
|
| 108 |
+
|
| 109 |
+
# 运行encoder
|
| 110 |
+
encoder_inputs = {'image': input_image}
|
| 111 |
+
high_res_feats_0, high_res_feats_1, image_embed = encoder_session.run(None, encoder_inputs)
|
| 112 |
+
|
| 113 |
+
# 准备decoder输入
|
| 114 |
+
point_coords, point_labels, mask_input, has_mask_input, orig_im_size = prepare_point_input(
|
| 115 |
+
input_point, input_label, orig_image.size[::-1]
|
| 116 |
+
)
|
| 117 |
+
|
| 118 |
+
# 运行decoder
|
| 119 |
+
decoder_inputs = {
|
| 120 |
+
'image_embed': image_embed,
|
| 121 |
+
'high_res_feats_0': high_res_feats_0,
|
| 122 |
+
'high_res_feats_1': high_res_feats_1,
|
| 123 |
+
'point_coords': point_coords,
|
| 124 |
+
'point_labels': point_labels,
|
| 125 |
+
# 'orig_im_size': orig_im_size,
|
| 126 |
+
'mask_input': mask_input,
|
| 127 |
+
'has_mask_input': has_mask_input,
|
| 128 |
+
}
|
| 129 |
+
|
| 130 |
+
low_res_masks, iou_predictions = decoder_session.run(None, decoder_inputs)
|
| 131 |
+
|
| 132 |
+
# 后处理: 将low_res_masks缩放到原始图片尺寸
|
| 133 |
+
w, h = orig_image.size
|
| 134 |
+
|
| 135 |
+
# 1. 首先将mask缩放到1024x1024
|
| 136 |
+
masks_1024 = torch.nn.functional.interpolate(
|
| 137 |
+
torch.from_numpy(low_res_masks),
|
| 138 |
+
size=(1024, 1024),
|
| 139 |
+
mode="bilinear",
|
| 140 |
+
align_corners=False
|
| 141 |
+
)
|
| 142 |
+
|
| 143 |
+
# 2. 去除padding
|
| 144 |
+
new_h = int(h * scale)
|
| 145 |
+
new_w = int(w * scale)
|
| 146 |
+
start_h = (1024 - new_h) // 2
|
| 147 |
+
start_w = (1024 - new_w) // 2
|
| 148 |
+
masks_no_pad = masks_1024[..., start_h:start_h+new_h, start_w:start_w+new_w]
|
| 149 |
+
|
| 150 |
+
# 3. 缩放到原始图片尺寸
|
| 151 |
+
masks_onnx = torch.nn.functional.interpolate(
|
| 152 |
+
masks_no_pad,
|
| 153 |
+
size=(h, w),
|
| 154 |
+
mode="bilinear",
|
| 155 |
+
align_corners=False
|
| 156 |
+
)
|
| 157 |
+
|
| 158 |
+
# 4. 二值化
|
| 159 |
+
masks_onnx = masks_onnx > 0.0
|
| 160 |
+
masks_onnx = masks_onnx.numpy()
|
| 161 |
+
|
| 162 |
+
# 在运行ONNX模型后,打印输出的shape
|
| 163 |
+
print(f"\nOutput shapes:")
|
| 164 |
+
print(f"PyTorch masks shape: {masks_pt.shape}")
|
| 165 |
+
print(f"ONNX masks shape: {masks_onnx.shape}")
|
| 166 |
+
|
| 167 |
+
# 修改可视化部分,暂时注释掉差异图
|
| 168 |
+
plt.figure(figsize=(10, 5))
|
| 169 |
+
|
| 170 |
+
# PyTorch结果
|
| 171 |
+
plt.subplot(121)
|
| 172 |
+
plt.imshow(orig_image)
|
| 173 |
+
plt.imshow(masks_pt[0], alpha=0.5)
|
| 174 |
+
plt.plot(input_point_orig[0][0], input_point_orig[0][1], 'rx')
|
| 175 |
+
plt.title('PyTorch Output')
|
| 176 |
+
plt.axis('off')
|
| 177 |
+
|
| 178 |
+
# ONNX结果
|
| 179 |
+
plt.subplot(122)
|
| 180 |
+
plt.imshow(orig_image)
|
| 181 |
+
plt.imshow(masks_onnx[0,0], alpha=0.5)
|
| 182 |
+
plt.plot(input_point_orig[0][0], input_point_orig[0][1], 'rx')
|
| 183 |
+
plt.title('ONNX Output')
|
| 184 |
+
plt.axis('off')
|
| 185 |
+
|
| 186 |
+
plt.tight_layout()
|
| 187 |
+
plt.show()
|
| 188 |
+
|
| 189 |
+
# 6. 打印一些统计信息
|
| 190 |
+
print("\nStatistics:")
|
| 191 |
+
print(f"PyTorch IoU scores: {iou_scores_pt}")
|
| 192 |
+
print(f"ONNX IoU predictions: {iou_predictions}")
|
| 193 |
+
|
| 194 |
+
if __name__ == "__main__":
|
| 195 |
+
main()
|
test_rknn.py
ADDED
|
@@ -0,0 +1,178 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import os
|
| 2 |
+
import time
|
| 3 |
+
os.chdir(os.path.dirname(os.path.abspath(__file__)))
|
| 4 |
+
|
| 5 |
+
import numpy as np
|
| 6 |
+
import onnxruntime
|
| 7 |
+
from rknnlite.api import RKNNLite
|
| 8 |
+
from PIL import Image
|
| 9 |
+
import matplotlib.pyplot as plt
|
| 10 |
+
import cv2
|
| 11 |
+
|
| 12 |
+
|
| 13 |
+
def load_image(path):
|
| 14 |
+
"""加载并预处理图片"""
|
| 15 |
+
image = Image.open(path).convert("RGB")
|
| 16 |
+
print(f"Original image size: {image.size}")
|
| 17 |
+
|
| 18 |
+
# 计算resize后的尺寸,保持长宽比
|
| 19 |
+
target_size = (1024, 1024)
|
| 20 |
+
w, h = image.size
|
| 21 |
+
scale = min(target_size[0] / w, target_size[1] / h)
|
| 22 |
+
new_w = int(w * scale)
|
| 23 |
+
new_h = int(h * scale)
|
| 24 |
+
print(f"Scale factor: {scale}")
|
| 25 |
+
print(f"Resized dimensions: {new_w}x{new_h}")
|
| 26 |
+
|
| 27 |
+
# resize图片
|
| 28 |
+
resized_image = image.resize((new_w, new_h), Image.Resampling.LANCZOS)
|
| 29 |
+
|
| 30 |
+
# 创建1024x1024的黑色背景
|
| 31 |
+
processed_image = Image.new("RGB", target_size, (0, 0, 0))
|
| 32 |
+
# 将resized图片粘贴到中心位置
|
| 33 |
+
paste_x = (target_size[0] - new_w) // 2
|
| 34 |
+
paste_y = (target_size[1] - new_h) // 2
|
| 35 |
+
print(f"Paste position: ({paste_x}, {paste_y})")
|
| 36 |
+
processed_image.paste(resized_image, (paste_x, paste_y))
|
| 37 |
+
|
| 38 |
+
# 保存处理后的图片用于检查
|
| 39 |
+
processed_image.save("debug_processed_image.png")
|
| 40 |
+
|
| 41 |
+
# 转换为numpy数组并归一化到[0,1] # 归一化整合到模型了
|
| 42 |
+
img_np = np.array(processed_image).astype(np.float32) # / 255.0
|
| 43 |
+
# 调整维度顺序从HWC到CHW
|
| 44 |
+
img_np = img_np.transpose(2, 0, 1)
|
| 45 |
+
# 添加batch维度
|
| 46 |
+
img_np = np.expand_dims(img_np, axis=0)
|
| 47 |
+
|
| 48 |
+
print(f"Final input tensor shape: {img_np.shape}")
|
| 49 |
+
|
| 50 |
+
return image, img_np, (scale, paste_x, paste_y)
|
| 51 |
+
|
| 52 |
+
def prepare_point_input(point_coords, point_labels, image_size=(1024, 1024)):
|
| 53 |
+
"""准备点击输入数据"""
|
| 54 |
+
point_coords = np.array(point_coords, dtype=np.float32)
|
| 55 |
+
point_labels = np.array(point_labels, dtype=np.float32)
|
| 56 |
+
|
| 57 |
+
# 添加batch维度
|
| 58 |
+
point_coords = np.expand_dims(point_coords, axis=0)
|
| 59 |
+
point_labels = np.expand_dims(point_labels, axis=0)
|
| 60 |
+
|
| 61 |
+
# 准备mask输入
|
| 62 |
+
mask_input = np.zeros((1, 1, 256, 256), dtype=np.float32)
|
| 63 |
+
has_mask_input = np.zeros(1, dtype=np.float32)
|
| 64 |
+
orig_im_size = np.array(image_size, dtype=np.int32)
|
| 65 |
+
|
| 66 |
+
return point_coords, point_labels, mask_input, has_mask_input, orig_im_size
|
| 67 |
+
|
| 68 |
+
def main():
|
| 69 |
+
# 1. 加载原始图片
|
| 70 |
+
path = "dog.jpg"
|
| 71 |
+
orig_image, input_image, (scale, offset_x, offset_y) = load_image(path)
|
| 72 |
+
decoder_path = "sam2.1_hiera_small_decoder.onnx"
|
| 73 |
+
encoder_path = "sam2.1_hiera_small_encoder.rknn"
|
| 74 |
+
|
| 75 |
+
# 2. 准备输入点
|
| 76 |
+
# input_point_orig = [[750, 400]]
|
| 77 |
+
input_point_orig = [[189, 394]]
|
| 78 |
+
input_point = [[
|
| 79 |
+
int(x * scale + offset_x),
|
| 80 |
+
int(y * scale + offset_y)
|
| 81 |
+
] for x, y in input_point_orig]
|
| 82 |
+
input_label = [1]
|
| 83 |
+
|
| 84 |
+
# 3. 运行RKNN encoder
|
| 85 |
+
print("Running RKNN encoder...")
|
| 86 |
+
rknn_lite = RKNNLite(verbose=False)
|
| 87 |
+
|
| 88 |
+
ret = rknn_lite.load_rknn(encoder_path)
|
| 89 |
+
if ret != 0:
|
| 90 |
+
print('Load RKNN model failed')
|
| 91 |
+
exit(ret)
|
| 92 |
+
|
| 93 |
+
ret = rknn_lite.init_runtime()
|
| 94 |
+
if ret != 0:
|
| 95 |
+
print('Init runtime environment failed')
|
| 96 |
+
exit(ret)
|
| 97 |
+
start_time = time.time()
|
| 98 |
+
encoder_outputs = rknn_lite.inference(inputs=[input_image], data_format="nchw")
|
| 99 |
+
end_time = time.time()
|
| 100 |
+
print(f"RKNN encoder time: {end_time - start_time} seconds")
|
| 101 |
+
high_res_feats_0, high_res_feats_1, image_embed = encoder_outputs
|
| 102 |
+
rknn_lite.release()
|
| 103 |
+
|
| 104 |
+
# 4. 运行ONNX decoder
|
| 105 |
+
print("Running ONNX decoder...")
|
| 106 |
+
decoder_session = onnxruntime.InferenceSession(decoder_path)
|
| 107 |
+
|
| 108 |
+
point_coords, point_labels, mask_input, has_mask_input, orig_im_size = prepare_point_input(
|
| 109 |
+
input_point, input_label, orig_image.size[::-1]
|
| 110 |
+
)
|
| 111 |
+
|
| 112 |
+
decoder_inputs = {
|
| 113 |
+
'image_embed': image_embed,
|
| 114 |
+
'high_res_feats_0': high_res_feats_0,
|
| 115 |
+
'high_res_feats_1': high_res_feats_1,
|
| 116 |
+
'point_coords': point_coords,
|
| 117 |
+
'point_labels': point_labels,
|
| 118 |
+
'mask_input': mask_input,
|
| 119 |
+
'has_mask_input': has_mask_input,
|
| 120 |
+
}
|
| 121 |
+
start_time = time.time()
|
| 122 |
+
low_res_masks, iou_predictions = decoder_session.run(None, decoder_inputs)
|
| 123 |
+
end_time = time.time()
|
| 124 |
+
print(f"ONNX decoder time: {end_time - start_time} seconds")
|
| 125 |
+
print(low_res_masks.shape)
|
| 126 |
+
# 5. 后处理
|
| 127 |
+
w, h = orig_image.size
|
| 128 |
+
masks_rknn = []
|
| 129 |
+
|
| 130 |
+
# 处理所有3个mask
|
| 131 |
+
for i in range(low_res_masks.shape[1]):
|
| 132 |
+
# 将mask缩放到1024x1024
|
| 133 |
+
masks_1024 = cv2.resize(
|
| 134 |
+
low_res_masks[0,i],
|
| 135 |
+
(1024, 1024),
|
| 136 |
+
interpolation=cv2.INTER_LINEAR
|
| 137 |
+
)
|
| 138 |
+
|
| 139 |
+
# 去除padding
|
| 140 |
+
new_h = int(h * scale)
|
| 141 |
+
new_w = int(w * scale)
|
| 142 |
+
start_h = (1024 - new_h) // 2
|
| 143 |
+
start_w = (1024 - new_w) // 2
|
| 144 |
+
masks_no_pad = masks_1024[start_h:start_h+new_h, start_w:start_w+new_w]
|
| 145 |
+
|
| 146 |
+
# 缩放到原始图片尺寸
|
| 147 |
+
mask = cv2.resize(
|
| 148 |
+
masks_no_pad,
|
| 149 |
+
(w, h),
|
| 150 |
+
interpolation=cv2.INTER_LINEAR
|
| 151 |
+
)
|
| 152 |
+
|
| 153 |
+
# 二值化
|
| 154 |
+
mask = mask > 0.0
|
| 155 |
+
masks_rknn.append(mask)
|
| 156 |
+
|
| 157 |
+
# 6. 可视化结果
|
| 158 |
+
plt.figure(figsize=(15, 5))
|
| 159 |
+
|
| 160 |
+
# 获取IoU分数排序的索引
|
| 161 |
+
sorted_indices = np.argsort(iou_predictions[0])[::-1] # 降序排序
|
| 162 |
+
|
| 163 |
+
for idx, mask_idx in enumerate(sorted_indices):
|
| 164 |
+
plt.subplot(1, 3, idx + 1)
|
| 165 |
+
plt.imshow(orig_image)
|
| 166 |
+
plt.imshow(masks_rknn[mask_idx], alpha=0.5)
|
| 167 |
+
plt.plot(input_point_orig[0][0], input_point_orig[0][1], 'rx')
|
| 168 |
+
plt.title(f'Mask {mask_idx+1}\nIoU: {iou_predictions[0][mask_idx]:.3f}')
|
| 169 |
+
plt.axis('off')
|
| 170 |
+
|
| 171 |
+
plt.tight_layout()
|
| 172 |
+
# plt.show()
|
| 173 |
+
plt.savefig("result.png")
|
| 174 |
+
|
| 175 |
+
print(f"\nIoU predictions: {iou_predictions}")
|
| 176 |
+
|
| 177 |
+
if __name__ == "__main__":
|
| 178 |
+
main()
|