Spaces:

RAM2118
/

VideoMaMa-Custom

Build error

App Files Files Community

VideoMaMa-Custom / README.md

RAM2118

Upload folder using huggingface_hub

8d777b1 verified 3 months ago

preview code

raw

history blame contribute delete

2.97 kB

	---
	title: VideoMaMa - Video Matting with Mask Guidance
	emoji: 🎬
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 4.0.0
	app_file: app.py
	pinned: false
	license: apache-2.0
	---

	# 🎬 VideoMaMa: Video Matting with Mask Guidance

	An interactive demo for high-quality video matting using sparse mask guidance. This demo combines SAM2 for automatic object tracking with our VideoMaMa model for generating alpha mattes.

	## 🌟 Features

	- Single-Click Object Selection: Simply click on the object you want to extract in the first frame
	- Automatic Tracking: SAM2 automatically tracks your selected object through all frames
	- High-Quality Matting: VideoMaMa generates smooth, temporally-consistent alpha mattes
	- Flexible Input: Upload your own video or try our provided samples
	- Customizable: Adjust augmentation settings for different scenarios

	## 🚀 How to Use

	1. Upload a video or select from samples
	2. Click on the object you want to extract in the first frame (displayed in the interface)
	3. Optionally adjust augmentation settings in the advanced options
	4. Click "Generate Matting" and wait for processing
	5. View your results: output video, comparison images, and mask track


	## 🔧 Installation (Local Setup)

	If you want to run this demo locally:

	```bash
	# Install dependencies
	pip install -r requirements.txt

	# Add sample videos to samples/ directory (optional)

	# Run the demo
	python app.py
	```

	## 🎯 Tips for Best Results

	- Click Precisely: Click on the center of the object you want to extract
	- Clear Objects: Works best with distinct foreground objects
	- Video Length: For faster processing, use shorter videos (< 5 seconds)
	- Augmentations:
	- Use "polygon" for cleaner geometric masks
	- Enable temporal augmentation for challenging videos
	- Try "bounding box" for very simple selections

	## 📚 Technical Details

	### Model Architecture
	- Base Model: Stable Video Diffusion (SVD-XT)
	- Conditioning: RGB frames + VAE-encoded masks
	- UNet: Fine-tuned with additional mask conditioning channels
	- Processing: Chunked inference (16 frames per chunk)

	### SAM2 Integration
	- Uses SAM2 video predictor for mask tracking
	- Propagates mask from single click point through entire video
	- Generates temporally consistent segmentation masks

	## 🤝 Contributing

	If you encounter issues or have suggestions:
	1. Check that all model checkpoints are correctly placed
	2. Ensure your GPU has sufficient VRAM
	3. Try reducing video length or resolution for testing


	## 🙏 Acknowledgments

	- SAM2: Meta AI's Segment Anything 2
	- Stable Video Diffusion: Stability AI's video generation model
	- Gradio: For the amazing UI framework

	## 📧 Contact

	For questions or issues, please open an issue on our GitHub repository.

	---

	Note: This demo is for research purposes. Processing times may vary based on video length and available compute resources.