SharpAI commited on
Commit
5192773
·
verified ·
1 Parent(s): ef57c59

Upload sam2-hiera-small ONNX models

Browse files
Files changed (6) hide show
  1. .gitattributes +1 -34
  2. README.md +175 -0
  3. config.json +13 -0
  4. decoder.onnx +3 -0
  5. encoder.onnx +3 -0
  6. encoder.with_runtime_opt.ort +3 -0
.gitattributes CHANGED
@@ -1,35 +1,2 @@
1
- *.7z filter=lfs diff=lfs merge=lfs -text
2
- *.arrow filter=lfs diff=lfs merge=lfs -text
3
- *.bin filter=lfs diff=lfs merge=lfs -text
4
- *.bz2 filter=lfs diff=lfs merge=lfs -text
5
- *.ckpt filter=lfs diff=lfs merge=lfs -text
6
- *.ftz filter=lfs diff=lfs merge=lfs -text
7
- *.gz filter=lfs diff=lfs merge=lfs -text
8
- *.h5 filter=lfs diff=lfs merge=lfs -text
9
- *.joblib filter=lfs diff=lfs merge=lfs -text
10
- *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
- *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
- *.model filter=lfs diff=lfs merge=lfs -text
13
- *.msgpack filter=lfs diff=lfs merge=lfs -text
14
- *.npy filter=lfs diff=lfs merge=lfs -text
15
- *.npz filter=lfs diff=lfs merge=lfs -text
16
  *.onnx filter=lfs diff=lfs merge=lfs -text
17
- *.ot filter=lfs diff=lfs merge=lfs -text
18
- *.parquet filter=lfs diff=lfs merge=lfs -text
19
- *.pb filter=lfs diff=lfs merge=lfs -text
20
- *.pickle filter=lfs diff=lfs merge=lfs -text
21
- *.pkl filter=lfs diff=lfs merge=lfs -text
22
- *.pt filter=lfs diff=lfs merge=lfs -text
23
- *.pth filter=lfs diff=lfs merge=lfs -text
24
- *.rar filter=lfs diff=lfs merge=lfs -text
25
- *.safetensors filter=lfs diff=lfs merge=lfs -text
26
- saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
- *.tar.* filter=lfs diff=lfs merge=lfs -text
28
- *.tar filter=lfs diff=lfs merge=lfs -text
29
- *.tflite filter=lfs diff=lfs merge=lfs -text
30
- *.tgz filter=lfs diff=lfs merge=lfs -text
31
- *.wasm filter=lfs diff=lfs merge=lfs -text
32
- *.xz filter=lfs diff=lfs merge=lfs -text
33
- *.zip filter=lfs diff=lfs merge=lfs -text
34
- *.zst filter=lfs diff=lfs merge=lfs -text
35
- *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  *.onnx filter=lfs diff=lfs merge=lfs -text
2
+ *.ort filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README.md ADDED
@@ -0,0 +1,175 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - sam2
5
+ - segment-anything
6
+ - onnx
7
+ - webgpu
8
+ - computer-vision
9
+ - image-segmentation
10
+ library_name: onnxruntime
11
+ ---
12
+
13
+ # SAM2-HIERA-SMALL - ONNX Format for WebGPU
14
+
15
+ **Powered by [Segment Anything 2 (SAM2)](https://github.com/facebookresearch/segment-anything-2) from Meta Research**
16
+
17
+ This repository contains ONNX-converted models from [facebook/sam2-hiera-small](https://huggingface.co/facebook/sam2-hiera-small), optimized for WebGPU deployment in browsers.
18
+
19
+ ## Model Information
20
+
21
+ - **Original Model**: [facebook/sam2-hiera-small](https://huggingface.co/facebook/sam2-hiera-small)
22
+ - **Version**: SAM 2.0
23
+ - **Size**: 46M parameters
24
+ - **Description**: Small variant - balanced speed and quality
25
+ - **Format**: ONNX (encoder + decoder)
26
+ - **Optimization**: Encoder optimized to .ort format for WebGPU
27
+
28
+ ## Files
29
+
30
+ - `encoder.onnx` - Image encoder (ONNX format)
31
+ - `encoder.with_runtime_opt.ort` - Image encoder (optimized for WebGPU)
32
+ - `decoder.onnx` - Mask decoder (ONNX format)
33
+ - `config.json` - Model configuration
34
+
35
+ ## Usage
36
+
37
+ ### In Browser with ONNX Runtime Web
38
+
39
+ ```javascript
40
+ import * as ort from 'onnxruntime-web/webgpu';
41
+
42
+ // Load encoder (use optimized .ort version for WebGPU)
43
+ const encoderURL = 'https://huggingface.co/SharpAI/sam2-hiera-small-onnx/resolve/main/encoder.with_runtime_opt.ort';
44
+ const encoderSession = await ort.InferenceSession.create(encoderURL, {
45
+ executionProviders: ['webgpu'],
46
+ graphOptimizationLevel: 'disabled'
47
+ });
48
+
49
+ // Load decoder
50
+ const decoderURL = 'https://huggingface.co/SharpAI/sam2-hiera-small-onnx/resolve/main/decoder.onnx';
51
+ const decoderSession = await ort.InferenceSession.create(decoderURL, {
52
+ executionProviders: ['webgpu']
53
+ });
54
+
55
+ // Run encoder
56
+ const imageData = preprocessImage(image); // Your preprocessing
57
+ const encoderOutputs = await encoderSession.run({ image: imageData });
58
+
59
+ // Run decoder with point
60
+ const point_coords = new ort.Tensor('float32', [x, y, 0, 0], [1, 2, 2]);
61
+ const point_labels = new ort.Tensor('float32', [1, -1], [1, 2]);
62
+ const mask_input = new ort.Tensor('float32', new Float32Array(256 * 256).fill(0), [1, 1, 256, 256]);
63
+ const has_mask_input = new ort.Tensor('float32', [0], [1]);
64
+
65
+ const decoderOutputs = await decoderSession.run({
66
+ image_embed: encoderOutputs.image_embed,
67
+ high_res_feats_0: encoderOutputs.high_res_feats_0,
68
+ high_res_feats_1: encoderOutputs.high_res_feats_1,
69
+ point_coords: point_coords,
70
+ point_labels: point_labels,
71
+ mask_input: mask_input,
72
+ has_mask_input: has_mask_input
73
+ });
74
+
75
+ // Get masks
76
+ const masks = decoderOutputs.masks; // Shape: [1, num_masks, 256, 256]
77
+ ```
78
+
79
+ ### In Python with ONNX Runtime
80
+
81
+ ```python
82
+ import onnxruntime as ort
83
+ import numpy as np
84
+
85
+ # Load models
86
+ encoder_session = ort.InferenceSession("encoder.onnx")
87
+ decoder_session = ort.InferenceSession("decoder.onnx")
88
+
89
+ # Run encoder
90
+ encoder_outputs = encoder_session.run(None, {"image": image_tensor})
91
+
92
+ # Run decoder
93
+ decoder_outputs = decoder_session.run(None, {
94
+ "image_embed": encoder_outputs[0],
95
+ "high_res_feats_0": encoder_outputs[1],
96
+ "high_res_feats_1": encoder_outputs[2],
97
+ "point_coords": point_coords,
98
+ "point_labels": point_labels,
99
+ "mask_input": mask_input,
100
+ "has_mask_input": has_mask_input
101
+ })
102
+
103
+ masks = decoder_outputs[0]
104
+ ```
105
+
106
+ ## Input/Output Specifications
107
+
108
+ ### Encoder
109
+
110
+ **Input:**
111
+ - `image`: Float32[1, 3, 1024, 1024] - Normalized RGB image
112
+
113
+ **Outputs:**
114
+ - `image_embed`: Float32[1, 256, 64, 64] - Image embeddings
115
+ - `high_res_feats_0`: Float32[1, 32, 256, 256] - High-res features (level 0)
116
+ - `high_res_feats_1`: Float32[1, 64, 128, 128] - High-res features (level 1)
117
+
118
+ ### Decoder
119
+
120
+ **Inputs:**
121
+ - `image_embed`: Float32[1, 256, 64, 64] - From encoder
122
+ - `high_res_feats_0`: Float32[1, 32, 256, 256] - From encoder
123
+ - `high_res_feats_1`: Float32[1, 64, 128, 128] - From encoder
124
+ - `point_coords`: Float32[1, 2, 2] - Point coordinates [[x, y], [0, 0]]
125
+ - `point_labels`: Float32[1, 2] - Point labels [1, -1] (1=foreground, -1=padding)
126
+ - `mask_input`: Float32[1, 1, 256, 256] - Previous mask (zeros if none)
127
+ - `has_mask_input`: Float32[1] - Flag [0] or [1]
128
+
129
+ **Outputs:**
130
+ - `masks`: Float32[1, 3, 256, 256] - Generated masks (3 candidates)
131
+ - `iou_predictions`: Float32[1, 3] - IoU scores for each mask
132
+ - `low_res_masks`: Float32[1, 3, 256, 256] - Low-resolution masks
133
+
134
+ ## Browser Requirements
135
+
136
+ - Chrome 113+ with WebGPU enabled (`chrome://flags/#enable-unsafe-webgpu`)
137
+ - Firefox Nightly with WebGPU enabled
138
+ - Safari Technology Preview with WebGPU enabled
139
+
140
+ ## Performance
141
+
142
+ Typical inference times on Chrome with WebGPU:
143
+ - **Encoder**: {'2-3s' if 'tiny' in model_name else '3-5s' if 'small' in model_name else '4-6s' if 'base' in model_name else '8-10s'}
144
+ - **Decoder**: 0.1-0.5s per point
145
+
146
+ ## License
147
+
148
+ This model is released under the Apache 2.0 license, following the original SAM2 model.
149
+
150
+ ## Citation
151
+
152
+ ```bibtex
153
+ @article{ravi2024sam2,
154
+ title={SAM 2: Segment Anything in Images and Videos},
155
+ author={Ravi, Nikhila and Gabeur, Valentin and Hu, Yuan-Ting and Hu, Ronghang and Ryali, Chaitanya and Ma, Tengyu and Khedr, Haitham and R{\"a}dle, Roman and Rolland, Chloe and Gustafson, Laura and Mintun, Eric and Pan, Junting and Alwala, Kalyan Vasudev and Carion, Nicolas and Wu, Chao-Yuan and Girshick, Ross and Doll{\'a}r, Piotr and Feichtenhofer, Christoph},
156
+ journal={arXiv preprint arXiv:2408.00714},
157
+ year={2024}
158
+ }
159
+ ```
160
+
161
+ ## Related Resources
162
+
163
+ - **Original SAM2**: [facebookresearch/segment-anything-2](https://github.com/facebookresearch/segment-anything-2)
164
+ - **WebGPU Demo**: [Aegis AI SAM2 WebGPU Demo](https://github.com/yourusername/Aegis-AI/tree/main/tools/sam2-webgpu)
165
+ - **Conversion Tool**: [SAM2 ONNX Converter](https://github.com/yourusername/Aegis-AI/tree/main/tools/sam2-converter)
166
+
167
+ ## Acknowledgments
168
+
169
+ - **Meta Research** for the original SAM2 model
170
+ - **Microsoft** for ONNX Runtime
171
+ - **SamExporter** for conversion tools
172
+
173
+ ---
174
+
175
+ *Converted and optimized by [Aegis AI](https://github.com/yourusername/Aegis-AI)*
config.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_name": "sam2-hiera-small",
3
+ "checkpoint_id": "facebook/sam2-hiera-small",
4
+ "checkpoint_path": "/Users/simba/.cache/huggingface/hub/models--facebook--sam2-hiera-small/snapshots/e080ada8afd19df5e165abe71b006edc7f4c3d4e/sam2_hiera_small.pt",
5
+ "version": "2.0",
6
+ "size": "46M",
7
+ "encoder_path": "encoder.onnx",
8
+ "encoder_optimized_path": "encoder.with_runtime_opt.ort",
9
+ "decoder_path": "decoder.onnx",
10
+ "image_size": 1024,
11
+ "mask_size": 256,
12
+ "conversion_date": "1763352682.598863"
13
+ }
decoder.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6cb43867303b46933fd85e4434239cdb60e3e60a7774aa725bed4331b5e38d75
3
+ size 20639854
encoder.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9c45d727441ee2e8256d405c296a428178ae514358f85d061e67a68eb820b19c
3
+ size 162703493
encoder.with_runtime_opt.ort ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ae1e63a5b67b59a7e244b94439087885ee8ce59699cd7200b0eacffa1dd1856b
3
+ size 162929760