egordm commited on
Commit
ef8a773
·
verified ·
1 Parent(s): debc84e

Add model card

Browse files
Files changed (1) hide show
  1. README.md +92 -0
README.md ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: onnxruntime
4
+ pipeline_tag: mask-generation
5
+ tags:
6
+ - tracking
7
+ - single-object-tracking
8
+ - video-object-segmentation
9
+ - sam2
10
+ - efficienttam
11
+ - onnx
12
+ - kubrick
13
+ ---
14
+
15
+ # EfficientTAM-Ti @ 512 (ONNX Bundle)
16
+
17
+ ONNX export of [EfficientTAM](https://github.com/yformer/EfficientTAM) (Tiny variant, 512x512 input) for use with [kubrick-tracking](https://github.com/egordm/kubrick-tracking).
18
+
19
+ EfficientTAM is a distilled variant of SAM 2 optimized for efficient video object segmentation. This bundle splits the model into 5 independently-runnable ONNX sessions for flexible deployment across CPU, CoreML, CUDA, and TensorRT backends.
20
+
21
+ ## Variants
22
+
23
+ | Variant | Precision | Total Size | Notes |
24
+ |---------|-----------|------------|-------|
25
+ | `fp32/` | float32 | ~77 MB | Reference quality, works everywhere |
26
+ | `fp16/` | float16 | ~40 MB | 2x smaller, GPU-accelerated backends |
27
+
28
+ ## Architecture
29
+
30
+ | Module | File | Input Shape | Purpose |
31
+ |--------|------|-------------|---------|
32
+ | image_encoder | `image_encoder.onnx` | [1, 3, 512, 512] | Frame feature extraction |
33
+ | prompt_encoder | `prompt_encoder.onnx` | [1, 2, 2] | Bbox/click/mask prompt encoding |
34
+ | mask_decoder | `mask_decoder.onnx` | [1, 256, 32, 32] | Mask prediction from features + prompt |
35
+ | memory_encoder | `memory_encoder.onnx` | [1, 256, 32, 32] | Encode frame into memory bank |
36
+ | memory_attention | `memory_attention.onnx` | dynamic | Cross-attention with memory bank |
37
+
38
+ Additional assets:
39
+ - `maskmem_tpos_enc.npy` -- temporal positional encoding for memory frames
40
+ - `no_obj_ptr.npy` -- no-object pointer embedding
41
+
42
+ ## Usage with kubrick-tracking
43
+
44
+ ```python
45
+ from kubrick.tracking import Tracker, MachineConfig, BBoxPrompt, BBox
46
+
47
+ # Automatically downloads and caches this bundle
48
+ config = MachineConfig.mac_m_series() # uses fp16 by default
49
+ tracker = Tracker.from_config(config)
50
+
51
+ tracker.init(frame, prompt=BBoxPrompt(bbox=BBox(x=100, y=50, w=80, h=120)))
52
+ result = tracker.step(next_frame)
53
+ ```
54
+
55
+ ## Manual download
56
+
57
+ ```python
58
+ from huggingface_hub import snapshot_download
59
+
60
+ # Download fp16 variant
61
+ path = snapshot_download(
62
+ repo_id="egordm/efficienttam-ti-512",
63
+ allow_patterns=["fp16/**"],
64
+ )
65
+ ```
66
+
67
+ ## Export reproduction
68
+
69
+ The bundle was exported using the script in the [kubrick-tracking](https://github.com/egordm/kubrick-tracking) repository:
70
+
71
+ ```bash
72
+ git clone https://github.com/egordm/kubrick-tracking.git
73
+ cd kubrick-tracking
74
+ uv run python models/efficienttam-ti-512/export.py --dtype fp16
75
+ ```
76
+
77
+ Requires the EfficientTAM checkpoint from the upstream repository.
78
+
79
+ ## Citation
80
+
81
+ ```bibtex
82
+ @article{xiong2024efficienttam,
83
+ title={EfficientTAM: Efficient Track Anything Model for Video Object Segmentation},
84
+ author={Xiong, Yunyang and Varadarajan, Siddharth and Wu, Zechun and Wang, Yong and Wang, Xiaolong},
85
+ journal={arXiv preprint arXiv:2403.08243},
86
+ year={2024}
87
+ }
88
+ ```
89
+
90
+ ## License
91
+
92
+ Apache-2.0