Instructions to use mlx-community/EdgeTAM-fp16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mlx-community/EdgeTAM-fp16 with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir EdgeTAM-fp16 mlx-community/EdgeTAM-fp16
- sam2
How to use mlx-community/EdgeTAM-fp16 with sam2:
# Use SAM2 with images import torch from sam2.sam2_image_predictor import SAM2ImagePredictor predictor = SAM2ImagePredictor.from_pretrained(mlx-community/EdgeTAM-fp16) with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16): predictor.set_image(<your_image>) masks, _, _ = predictor.predict(<input_prompts>)# Use SAM2 with videos import torch from sam2.sam2_video_predictor import SAM2VideoPredictor predictor = SAM2VideoPredictor.from_pretrained(mlx-community/EdgeTAM-fp16) with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16): state = predictor.init_state(<your_video>) # add new prompts and instantly get the output on the same frame frame_idx, object_ids, masks = predictor.add_new_points(state, <your_prompts>): # propagate the prompts to get masklets throughout the video for frame_idx, object_ids, masks in predictor.propagate_in_video(state): ... - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
| library_name: mlx | |
| license: apache-2.0 | |
| license_link: https://github.com/facebookresearch/EdgeTAM/blob/main/LICENSE | |
| base_model: facebookresearch/EdgeTAM | |
| pipeline_tag: image-segmentation | |
| tags: | |
| - mlx | |
| - segmentation | |
| - promptable-segmentation | |
| - sam2 | |
| - edgetam | |
| # mlx-community/EdgeTAM-fp16 | |
| [EdgeTAM](https://github.com/facebookresearch/EdgeTAM) β on-device SAM 2 for promptable segmentation + | |
| video tracking β converted to **Apple MLX** (`-fp16`) for the | |
| [`mlx-edgetam-swift`](https://github.com/xocialize/mlx-edgetam-swift) Swift package (MLXEngine | |
| `promptSegment` + `trackObject` ModelPackage). 22Γ faster than SAM 2, 16 FPS on iPhone 15 Pro Max. | |
| From-scratch MLX-Swift architecture port. **Image-mode** (point/box β mask): RepViT-M1 encoder + FPN + | |
| SAM prompt encoder + two-way mask decoder β parity-locked vs the PyTorch oracle on the CPU stream | |
| (image_embed 9.7e-6, mask logits 8.9e-5; end-to-end mask **IoU 0.99** vs PyTorch). **Video-mode** | |
| (`trackObject`, click on one frame β per-frame masklet): adds the video memory stack β PerceiverResampler | |
| + MemoryEncoder + MemoryAttention (RoPE-2D) + the SAM2 memory-bank state machine β every op parity-locked | |
| vs the oracle; full masklet propagation min-IoU 0.92. This single `-fp16` file carries both (874 tensors). | |
| ## Use | |
| ```swift | |
| // Package.swift β .package(url: "https://github.com/xocialize/mlx-edgetam-swift", from: "0.1.0") | |
| import EdgeTAM | |
| // Image: click β object mask | |
| let p = try EdgeTAMPredictor.fromPretrained(weightsPath, dtype: .float16) | |
| p.setImage(sourceCGImage) | |
| let (mask, score, _, _) = p.predict(point: (500, 375)) | |
| // Video: click on a frame β per-frame masklet | |
| let vp = try EdgeTAMVideoPredictor.fromPretrained(weightsPath, dtype: .float16) | |
| let track = vp.track(frames: cgImages, clickFrame: 0, points: [[210, 350]], labels: [1]) | |
| ``` | |
| Or as an MLXEngine ModelPackage (`MLXEdgeTAM.EdgeTAMPackage`) β `promptSegment` (image) + `trackObject` | |
| (video) surfaces β resolving this repo via the Hub. | |
| Weights: Apache-2.0 (facebookresearch/EdgeTAM). Port code: MIT. | |