EdgeTAM-fp16 / README.md
xocialize's picture
Upload README.md with huggingface_hub
fd5dabf verified
|
Raw
History Blame Contribute Delete
2.06 kB
---
library_name: mlx
license: apache-2.0
license_link: https://github.com/facebookresearch/EdgeTAM/blob/main/LICENSE
base_model: facebookresearch/EdgeTAM
pipeline_tag: image-segmentation
tags:
- mlx
- segmentation
- promptable-segmentation
- sam2
- edgetam
---
# mlx-community/EdgeTAM-fp16
[EdgeTAM](https://github.com/facebookresearch/EdgeTAM) β€” on-device SAM 2 for promptable segmentation +
video tracking β€” converted to **Apple MLX** (`-fp16`) for the
[`mlx-edgetam-swift`](https://github.com/xocialize/mlx-edgetam-swift) Swift package (MLXEngine
`promptSegment` + `trackObject` ModelPackage). 22Γ— faster than SAM 2, 16 FPS on iPhone 15 Pro Max.
From-scratch MLX-Swift architecture port. **Image-mode** (point/box β†’ mask): RepViT-M1 encoder + FPN +
SAM prompt encoder + two-way mask decoder β€” parity-locked vs the PyTorch oracle on the CPU stream
(image_embed 9.7e-6, mask logits 8.9e-5; end-to-end mask **IoU 0.99** vs PyTorch). **Video-mode**
(`trackObject`, click on one frame β†’ per-frame masklet): adds the video memory stack β€” PerceiverResampler
+ MemoryEncoder + MemoryAttention (RoPE-2D) + the SAM2 memory-bank state machine β€” every op parity-locked
vs the oracle; full masklet propagation min-IoU 0.92. This single `-fp16` file carries both (874 tensors).
## Use
```swift
// Package.swift β†’ .package(url: "https://github.com/xocialize/mlx-edgetam-swift", from: "0.1.0")
import EdgeTAM
// Image: click β†’ object mask
let p = try EdgeTAMPredictor.fromPretrained(weightsPath, dtype: .float16)
p.setImage(sourceCGImage)
let (mask, score, _, _) = p.predict(point: (500, 375))
// Video: click on a frame β†’ per-frame masklet
let vp = try EdgeTAMVideoPredictor.fromPretrained(weightsPath, dtype: .float16)
let track = vp.track(frames: cgImages, clickFrame: 0, points: [[210, 350]], labels: [1])
```
Or as an MLXEngine ModelPackage (`MLXEdgeTAM.EdgeTAMPackage`) β€” `promptSegment` (image) + `trackObject`
(video) surfaces β€” resolving this repo via the Hub.
Weights: Apache-2.0 (facebookresearch/EdgeTAM). Port code: MIT.