EfficientTAM-Ti @ 512 (ONNX Bundle)

ONNX export of EfficientTAM (Tiny variant, 512x512 input) for use with kubrick-tracking.

EfficientTAM is a distilled variant of SAM 2 optimized for efficient video object segmentation. This bundle splits the model into 5 independently-runnable ONNX sessions for flexible deployment across CPU, CoreML, CUDA, and TensorRT backends.

Variants

Variant	Precision	Total Size	Notes
`fp32/`	float32	~77 MB	Reference quality, works everywhere
`fp16/`	float16	~40 MB	2x smaller, GPU-accelerated backends

Architecture

Module	File	Input Shape	Purpose
image_encoder	`image_encoder.onnx`	[1, 3, 512, 512]	Frame feature extraction
prompt_encoder	`prompt_encoder.onnx`	[1, 2, 2]	Bbox/click/mask prompt encoding
mask_decoder	`mask_decoder.onnx`	[1, 256, 32, 32]	Mask prediction from features + prompt
memory_encoder	`memory_encoder.onnx`	[1, 256, 32, 32]	Encode frame into memory bank
memory_attention	`memory_attention.onnx`	dynamic	Cross-attention with memory bank

Additional assets:

maskmem_tpos_enc.npy -- temporal positional encoding for memory frames
no_obj_ptr.npy -- no-object pointer embedding

Usage with kubrick-tracking

from kubrick.tracking import Tracker, MachineConfig, BBoxPrompt, BBox

# Automatically downloads and caches this bundle
config = MachineConfig.mac_m_series()  # uses fp16 by default
tracker = Tracker.from_config(config)

tracker.init(frame, prompt=BBoxPrompt(bbox=BBox(x=100, y=50, w=80, h=120)))
result = tracker.step(next_frame)

Manual download

from huggingface_hub import snapshot_download

# Download fp16 variant
path = snapshot_download(
    repo_id="egordm/efficienttam-ti-512",
    allow_patterns=["fp16/**"],
)

Export reproduction

The bundle was exported using the script in the kubrick-tracking repository:

git clone https://github.com/egordm/kubrick-tracking.git
cd kubrick-tracking
uv run python models/efficienttam-ti-512/export.py --dtype fp16

Requires the EfficientTAM checkpoint from the upstream repository.

Citation

@article{xiong2024efficienttam,
  title={EfficientTAM: Efficient Track Anything Model for Video Object Segmentation},
  author={Xiong, Yunyang and Varadarajan, Siddharth and Wu, Zechun and Wang, Yong and Wang, Xiaolong},
  journal={arXiv preprint arXiv:2403.08243},
  year={2024}
}

License

Apache-2.0

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Mask Generation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for egordm/efficienttam-ti-512

Spin characters of the symmetric group which are proportional to linear characters in characteristic 2

Paper • 2403.08243 • Published Oct 13, 2025