Spin characters of the symmetric group which are proportional to linear characters in characteristic 2
Paper • 2403.08243 • Published
How to use egordm/efficienttam-ti-512 with sam2:
# Use SAM2 with images
import torch
from sam2.sam2_image_predictor import SAM2ImagePredictor
predictor = SAM2ImagePredictor.from_pretrained(egordm/efficienttam-ti-512)
with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16):
predictor.set_image(<your_image>)
masks, _, _ = predictor.predict(<input_prompts>) # Use SAM2 with videos
import torch
from sam2.sam2_video_predictor import SAM2VideoPredictor
predictor = SAM2VideoPredictor.from_pretrained(egordm/efficienttam-ti-512)
with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16):
state = predictor.init_state(<your_video>)
# add new prompts and instantly get the output on the same frame
frame_idx, object_ids, masks = predictor.add_new_points(state, <your_prompts>):
# propagate the prompts to get masklets throughout the video
for frame_idx, object_ids, masks in predictor.propagate_in_video(state):
...ONNX export of EfficientTAM (Tiny variant, 512x512 input) for use with kubrick-tracking.
EfficientTAM is a distilled variant of SAM 2 optimized for efficient video object segmentation. This bundle splits the model into 5 independently-runnable ONNX sessions for flexible deployment across CPU, CoreML, CUDA, and TensorRT backends.
| Variant | Precision | Total Size | Notes |
|---|---|---|---|
fp32/ |
float32 | ~77 MB | Reference quality, works everywhere |
fp16/ |
float16 | ~40 MB | 2x smaller, GPU-accelerated backends |
| Module | File | Input Shape | Purpose |
|---|---|---|---|
| image_encoder | image_encoder.onnx |
[1, 3, 512, 512] | Frame feature extraction |
| prompt_encoder | prompt_encoder.onnx |
[1, 2, 2] | Bbox/click/mask prompt encoding |
| mask_decoder | mask_decoder.onnx |
[1, 256, 32, 32] | Mask prediction from features + prompt |
| memory_encoder | memory_encoder.onnx |
[1, 256, 32, 32] | Encode frame into memory bank |
| memory_attention | memory_attention.onnx |
dynamic | Cross-attention with memory bank |
Additional assets:
maskmem_tpos_enc.npy -- temporal positional encoding for memory framesno_obj_ptr.npy -- no-object pointer embeddingfrom kubrick.tracking import Tracker, MachineConfig, BBoxPrompt, BBox
# Automatically downloads and caches this bundle
config = MachineConfig.mac_m_series() # uses fp16 by default
tracker = Tracker.from_config(config)
tracker.init(frame, prompt=BBoxPrompt(bbox=BBox(x=100, y=50, w=80, h=120)))
result = tracker.step(next_frame)
from huggingface_hub import snapshot_download
# Download fp16 variant
path = snapshot_download(
repo_id="egordm/efficienttam-ti-512",
allow_patterns=["fp16/**"],
)
The bundle was exported using the script in the kubrick-tracking repository:
git clone https://github.com/egordm/kubrick-tracking.git
cd kubrick-tracking
uv run python models/efficienttam-ti-512/export.py --dtype fp16
Requires the EfficientTAM checkpoint from the upstream repository.
@article{xiong2024efficienttam,
title={EfficientTAM: Efficient Track Anything Model for Video Object Segmentation},
author={Xiong, Yunyang and Varadarajan, Siddharth and Wu, Zechun and Wang, Yong and Wang, Xiaolong},
journal={arXiv preprint arXiv:2403.08243},
year={2024}
}
Apache-2.0