egordm
/

efficienttam-ti-512

Mask Generation

single-object-tracking

video-object-segmentation

Model card Files Files and versions

egordm commited on 16 days ago

Commit

ef8a773

·

verified ·

1 Parent(s): debc84e

Add model card

Files changed (1) hide show

README.md +92 -0

README.md ADDED Viewed

	@@ -0,0 +1,92 @@

+---
+license: apache-2.0
+library_name: onnxruntime
+pipeline_tag: mask-generation
+tags:
+  - tracking
+  - single-object-tracking
+  - video-object-segmentation
+  - sam2
+  - efficienttam
+  - onnx
+  - kubrick
+---
+# EfficientTAM-Ti @ 512 (ONNX Bundle)
+ONNX export of [EfficientTAM](https://github.com/yformer/EfficientTAM) (Tiny variant, 512x512 input) for use with [kubrick-tracking](https://github.com/egordm/kubrick-tracking).
+EfficientTAM is a distilled variant of SAM 2 optimized for efficient video object segmentation. This bundle splits the model into 5 independently-runnable ONNX sessions for flexible deployment across CPU, CoreML, CUDA, and TensorRT backends.
+## Variants
+| Variant | Precision | Total Size | Notes |
+|---------|-----------|------------|-------|
+| `fp32/` | float32 | ~77 MB | Reference quality, works everywhere |
+| `fp16/` | float16 | ~40 MB | 2x smaller, GPU-accelerated backends |
+## Architecture
+| Module | File | Input Shape | Purpose |
+|--------|------|-------------|---------|
+| image_encoder | `image_encoder.onnx` | [1, 3, 512, 512] | Frame feature extraction |
+| prompt_encoder | `prompt_encoder.onnx` | [1, 2, 2] | Bbox/click/mask prompt encoding |
+| mask_decoder | `mask_decoder.onnx` | [1, 256, 32, 32] | Mask prediction from features + prompt |
+| memory_encoder | `memory_encoder.onnx` | [1, 256, 32, 32] | Encode frame into memory bank |
+| memory_attention | `memory_attention.onnx` | dynamic | Cross-attention with memory bank |
+Additional assets:
+- `maskmem_tpos_enc.npy` -- temporal positional encoding for memory frames
+- `no_obj_ptr.npy` -- no-object pointer embedding
+## Usage with kubrick-tracking
+```python
+from kubrick.tracking import Tracker, MachineConfig, BBoxPrompt, BBox
+# Automatically downloads and caches this bundle
+config = MachineConfig.mac_m_series()  # uses fp16 by default
+tracker = Tracker.from_config(config)
+tracker.init(frame, prompt=BBoxPrompt(bbox=BBox(x=100, y=50, w=80, h=120)))
+result = tracker.step(next_frame)
+```
+## Manual download
+```python
+from huggingface_hub import snapshot_download
+# Download fp16 variant
+path = snapshot_download(
+    repo_id="egordm/efficienttam-ti-512",
+    allow_patterns=["fp16/**"],
+)
+```
+## Export reproduction
+The bundle was exported using the script in the [kubrick-tracking](https://github.com/egordm/kubrick-tracking) repository:
+```bash
+git clone https://github.com/egordm/kubrick-tracking.git
+cd kubrick-tracking
+uv run python models/efficienttam-ti-512/export.py --dtype fp16
+```
+Requires the EfficientTAM checkpoint from the upstream repository.
+## Citation
+```bibtex
+@article{xiong2024efficienttam,
+  title={EfficientTAM: Efficient Track Anything Model for Video Object Segmentation},
+  author={Xiong, Yunyang and Varadarajan, Siddharth and Wu, Zechun and Wang, Yong and Wang, Xiaolong},
+  journal={arXiv preprint arXiv:2403.08243},
+  year={2024}
+}
+```
+## License
+Apache-2.0