| | --- |
| | title: SAM2 Video Background Remover |
| | emoji: ๐ฅ |
| | colorFrom: blue |
| | colorTo: purple |
| | sdk: gradio |
| | sdk_version: 4.44.0 |
| | app_file: app.py |
| | pinned: false |
| | license: apache-2.0 |
| | tags: |
| | - computer-vision |
| | - video |
| | - segmentation |
| | - sam2 |
| | - background-removal |
| | - object-tracking |
| | --- |
| | |
| | # ๐ฅ SAM2 Video Background Remover |
| |
|
| | Remove backgrounds from videos by tracking objects using Meta's **Segment Anything Model 2 (SAM2)**. |
| |
|
| | ## Features |
| |
|
| | โจ **Background Removal**: Automatically remove backgrounds and keep only tracked objects |
| | ๐ฏ **Object Tracking**: Track multiple objects across video frames |
| | ๐ฅ๏ธ **Interactive UI**: Easy-to-use Gradio interface |
| | ๐ **REST API**: Programmatic access via API endpoints |
| | โก **GPU Accelerated**: Fast processing with CUDA support |
| |
|
| | ## How It Works |
| |
|
| | SAM2 is a foundation model for video segmentation that can: |
| | 1. **Segment objects** based on point or box annotations |
| | 2. **Track objects** automatically across all video frames |
| | 3. **Handle occlusions** and object reappearance |
| | 4. **Process multiple objects** simultaneously |
| |
|
| | ## Usage |
| |
|
| | ### ๐ฑ๏ธ Simple Mode (Web UI) |
| |
|
| | 1. Upload your video |
| | 2. Specify X,Y coordinates of the object you want to track (from first frame) |
| | 3. Click "Process Video" |
| | 4. Download the result with background removed! |
| |
|
| | **Example**: For a 640x480 video with a person in the center, use X=320, Y=240 |
| |
|
| | ### ๐ง Advanced Mode (JSON Annotations) |
| |
|
| | For more control, use JSON annotations: |
| |
|
| | ```json |
| | [ |
| | { |
| | "frame_idx": 0, |
| | "object_id": 1, |
| | "points": [[320, 240]], |
| | "labels": [1] |
| | } |
| | ] |
| | ``` |
| |
|
| | **Parameters**: |
| | - `frame_idx`: Frame number to annotate (0 = first frame) |
| | - `object_id`: Unique ID for each object (1, 2, 3, ...) |
| | - `points`: List of [x, y] coordinates on the object |
| | - `labels`: `1` for foreground point, `0` for background point |
| |
|
| | ### ๐ก API Usage |
| |
|
| | You can call this Space programmatically using the Gradio Client: |
| |
|
| | #### Python Example |
| |
|
| | ```python |
| | from gradio_client import Client |
| | import json |
| | |
| | # Connect to the Space |
| | client = Client("YOUR_USERNAME/sam2-video-bg-remover") |
| | |
| | # Define what to track |
| | annotations = [ |
| | { |
| | "frame_idx": 0, |
| | "object_id": 1, |
| | "points": [[320, 240]], # x, y coordinates |
| | "labels": [1] # 1 = foreground |
| | } |
| | ] |
| | |
| | # Process video |
| | result = client.predict( |
| | video_file="./input_video.mp4", |
| | annotations_json=json.dumps(annotations), |
| | remove_background=True, |
| | max_frames=300, # Limit frames for faster processing |
| | api_name="/segment_video_api" |
| | ) |
| | |
| | print(f"Output video saved to: {result}") |
| | ``` |
| |
|
| | #### Track Multiple Objects |
| |
|
| | ```python |
| | annotations = [ |
| | # First object (person) |
| | { |
| | "frame_idx": 0, |
| | "object_id": 1, |
| | "points": [[320, 240]], |
| | "labels": [1] |
| | }, |
| | # Second object (ball) |
| | { |
| | "frame_idx": 0, |
| | "object_id": 2, |
| | "points": [[500, 300]], |
| | "labels": [1] |
| | } |
| | ] |
| | ``` |
| |
|
| | #### Refine Segmentation with Background Points |
| |
|
| | ```python |
| | annotations = [ |
| | { |
| | "frame_idx": 0, |
| | "object_id": 1, |
| | "points": [ |
| | [320, 240], # Point ON the object |
| | [100, 100] # Point on background to exclude |
| | ], |
| | "labels": [1, 0] # 1=foreground, 0=background |
| | } |
| | ] |
| | ``` |
| |
|
| | ### ๐ HTTP API |
| |
|
| | You can also call the API directly via HTTP: |
| |
|
| | ```bash |
| | curl -X POST https://YOUR_USERNAME-sam2-video-bg-remover.hf.space/api/predict \ |
| | -F "video_file=@input_video.mp4" \ |
| | -F 'annotations_json=[{"frame_idx":0,"object_id":1,"points":[[320,240]],"labels":[1]}]' \ |
| | -F "remove_background=true" \ |
| | -F "max_frames=300" |
| | ``` |
| |
|
| | ## Parameters |
| |
|
| | | Parameter | Type | Default | Description | |
| | |-----------|------|---------|-------------| |
| | | `video_file` | File | - | Input video file (required) | |
| | | `annotations_json` | String | - | JSON array of annotations (required) | |
| | | `remove_background` | Boolean | `true` | Remove background or just highlight objects | |
| | | `max_frames` | Integer | `null` | Limit frames for faster processing | |
| |
|
| | ## Tips & Best Practices |
| |
|
| | ### ๐ฏ Getting Good Results |
| |
|
| | 1. **Choose Clear Points**: Click on the center/most distinctive part of your object |
| | 2. **Add Multiple Points**: For complex objects, add 2-3 points on different parts |
| | 3. **Use Background Points**: Add points with `label: 0` on areas you DON'T want |
| | 4. **Annotate Key Frames**: If object changes significantly, add annotations on multiple frames |
| |
|
| | ### โก Performance Tips |
| |
|
| | 1. **Limit Frames**: Use `max_frames` parameter for long videos |
| | 2. **Use Smaller Model**: Default is `sam2.1-hiera-tiny` for speed |
| | 3. **Process Shorter Clips**: Split long videos into segments |
| |
|
| | ### ๐ Troubleshooting |
| |
|
| | | Issue | Solution | |
| | |-------|----------| |
| | | Object not tracked | Add more points on different parts of the object | |
| | | Background leakage | Add background points with `label: 0` | |
| | | Slow processing | Reduce `max_frames` or use a shorter video | |
| | | Wrong object tracked | Be more precise with point coordinates | |
| |
|
| | ## Model Information |
| |
|
| | This Space uses **facebook/sam2.1-hiera-tiny** for efficient processing. Other available models: |
| |
|
| | - `facebook/sam2.1-hiera-tiny` - Fastest, good quality โก |
| | - `facebook/sam2.1-hiera-small` - Balanced |
| | - `facebook/sam2.1-hiera-base-plus` - Higher quality |
| | - `facebook/sam2.1-hiera-large` - Best quality, slower ๐ฏ |
| |
|
| | ## Use Cases |
| |
|
| | - ๐ฌ **Video Production**: Remove backgrounds for green screen effects |
| | - ๐ **Sports Analysis**: Isolate athletes for motion analysis |
| | - ๐ฎ **Content Creation**: Extract game characters or objects |
| | - ๐ฌ **Research**: Track objects in scientific videos |
| | - ๐ฑ **Social Media**: Create engaging content with background removal |
| |
|
| | ## Limitations |
| |
|
| | - Video length affects processing time (longer = slower) |
| | - GPU recommended for videos > 10 seconds |
| | - Very fast-moving objects may require multiple annotations |
| | - Extreme lighting changes can affect tracking quality |
| |
|
| | ## Citation |
| |
|
| | If you use this Space, please cite the SAM2 paper: |
| |
|
| | ```bibtex |
| | @article{ravi2024sam2, |
| | title={Segment Anything in Images and Videos}, |
| | author={Ravi, Nikhila and Gabeur, Valentin and Hu, Yuan-Ting and Hu, Ronghang and Ryali, Chaitanya and Ma, Tengyu and Khedr, Haitham and R{\"a}dle, Roman and Rolland, Chloe and Gustafson, Laura and others}, |
| | journal={arXiv preprint arXiv:2408.00714}, |
| | year={2024} |
| | } |
| | ``` |
| |
|
| | ## License |
| |
|
| | Apache 2.0 |
| |
|
| | ## Links |
| |
|
| | - ๐ [SAM2 Documentation](https://huggingface.co/docs/transformers/model_doc/sam2_video) |
| | - ๐ค [Model on Hugging Face](https://huggingface.co/facebook/sam2.1-hiera-tiny) |
| | - ๐ [Research Paper](https://arxiv.org/abs/2408.00714) |
| | - ๐ป [Original Repository](https://github.com/facebookresearch/segment-anything-2) |
| |
|
| | --- |
| |
|
| | Built with โค๏ธ using [Transformers](https://github.com/huggingface/transformers) and [Gradio](https://gradio.app) |
| |
|
| |
|