| # VINE Model - Quick Start Guide | |
| Get started with VINE video understanding in 2 simple steps! | |
| ## One-Command Setup | |
| ```bash | |
| # Download and run the complete setup script | |
| wget https://huggingface.co/video-fm/vine/resolve/main/setup_vine_complete.sh | |
| bash setup_vine_complete.sh | |
| ``` | |
| **That's it!** This single script: | |
| - β Creates conda environment with Python 3.10 | |
| - β Installs PyTorch with CUDA support | |
| - β Clones all required repositories (laser, sam2, groundingdino, vine_hf) | |
| - β Downloads SAM2 checkpoint (~149 MB) | |
| - β Downloads GroundingDINO checkpoint (~662 MB) | |
| - β Downloads all config files | |
| - β Tests the installation | |
| **Total setup time**: ~10-15 minutes (depending on download speed) | |
| ## What Gets Installed | |
| ``` | |
| your-directory/ | |
| βββ checkpoints/ | |
| β βββ sam2_hiera_tiny.pt (~149 MB) | |
| β βββ sam2_hiera_t.yaml | |
| β βββ groundingdino_swint_ogc.pth (~662 MB) | |
| β βββ GroundingDINO_SwinT_OGC.py | |
| βββ src/ | |
| β βββ LASER/ (video processing utilities) | |
| β βββ video-sam2/ (SAM2 segmentation) | |
| β βββ GroundingDINO/ (object detection) | |
| β βββ vine_hf/ (VINE HuggingFace interface) | |
| βββ test_vine.py (test script) | |
| ``` | |
| ## Usage After Setup | |
| ### Activate Environment | |
| ```bash | |
| conda activate vine_demo | |
| ``` | |
| ### Test Installation | |
| ```python | |
| python test_vine.py | |
| ``` | |
| ### Use in Your Code | |
| ```python | |
| from transformers import AutoModel | |
| from vine_hf import VinePipeline | |
| from pathlib import Path | |
| # Load VINE model from HuggingFace | |
| model = AutoModel.from_pretrained('video-fm/vine', trust_remote_code=True) | |
| # Set up checkpoint paths | |
| checkpoint_dir = Path("checkpoints") | |
| # Create pipeline | |
| pipeline = VinePipeline( | |
| model=model, | |
| tokenizer=None, | |
| sam_config_path=str(checkpoint_dir / "sam2_hiera_t.yaml"), | |
| sam_checkpoint_path=str(checkpoint_dir / "sam2_hiera_tiny.pt"), | |
| gd_config_path=str(checkpoint_dir / "GroundingDINO_SwinT_OGC.py"), | |
| gd_checkpoint_path=str(checkpoint_dir / "groundingdino_swint_ogc.pth"), | |
| device="cuda", | |
| trust_remote_code=True | |
| ) | |
| # Process a video | |
| results = pipeline( | |
| "path/to/video.mp4", | |
| categorical_keywords=['person', 'dog', 'ball'], | |
| unary_keywords=['running', 'jumping', 'sitting'], | |
| binary_keywords=['chasing', 'next to', 'holding'], | |
| object_pairs=[(0, 1), (0, 2)], # person-dog, person-ball | |
| return_top_k=5 | |
| ) | |
| # Print results | |
| print(f"Detected {results['summary']['num_objects_detected']} objects") | |
| print(f"Top categories: {results['summary']['top_categories']}") | |
| print(f"Top actions: {results['summary']['top_actions']}") | |
| print(f"Top relations: {results['summary']['top_relations']}") | |
| ``` | |
| ## System Requirements | |
| - **OS**: Linux (tested on Ubuntu) | |
| - **Python**: 3.10+ | |
| - **CUDA**: 11.8+ (for GPU acceleration) | |
| - **GPU**: 8GB+ VRAM recommended (T4, V100, A100, etc.) | |
| - **RAM**: 16GB+ recommended | |
| - **Disk Space**: ~5GB total | |
| - Conda environment: ~3GB | |
| - Checkpoints: ~811MB | |
| - Code repositories: ~1GB | |
| ## Troubleshooting | |
| ### CUDA Not Available | |
| ```bash | |
| # Check CUDA | |
| nvidia-smi | |
| # If not working, install CPU-only version | |
| pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu | |
| ``` | |
| ### Download Failed | |
| ```bash | |
| # Manually download checkpoints | |
| cd checkpoints | |
| # SAM2 | |
| wget https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_tiny.pt | |
| wget https://raw.githubusercontent.com/facebookresearch/sam2/main/sam2/configs/sam2.1/sam2.1_hiera_t.yaml -O sam2_hiera_t.yaml | |
| # GroundingDINO | |
| wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth | |
| wget https://raw.githubusercontent.com/IDEA-Research/GroundingDINO/main/groundingdino/config/GroundingDINO_SwinT_OGC.py | |
| ``` | |
| ### Import Errors | |
| ```bash | |
| # Reinstall packages | |
| conda activate vine_demo | |
| cd src | |
| pip install -e ./LASER | |
| pip install -e ./video-sam2 | |
| pip install -e ./GroundingDINO | |
| pip install -e ./vine_hf | |
| ``` | |
| ## Alternative: Manual Setup | |
| If you prefer to set up manually or the script fails, see [README.md](https://huggingface.co/video-fm/vine) for step-by-step instructions. | |
| ## Next Steps | |
| - **Process your videos**: Use the pipeline with your own videos | |
| - **Customize keywords**: Adjust categorical, unary, and binary keywords | |
| - **Visualize results**: Enable `visualize=True` in config | |
| - **Deploy**: Use in HuggingFace Spaces, FastAPI, or your own app | |
| ## Links | |
| - **Model**: https://huggingface.co/video-fm/vine | |
| - **Setup Script**: https://huggingface.co/video-fm/vine/blob/main/setup_vine_complete.sh | |
| - **Documentation**: https://huggingface.co/video-fm/vine#readme | |
| - **Code**: https://github.com/kevinxuez/LASER | |
| - **Issues**: https://github.com/kevinxuez/LASER/issues | |
| ## Support | |
| - **Setup Issues**: Check the script output for errors | |
| - **Model Issues**: https://huggingface.co/video-fm/vine/discussions | |
| - **Code Issues**: https://github.com/kevinxuez/LASER/issues | |
| --- | |
| **Ready to start?** | |
| ```bash | |
| wget https://huggingface.co/video-fm/vine/resolve/main/setup_vine_complete.sh | |
| bash setup_vine_complete.sh | |
| ``` | |
| π Happy video understanding with VINE! | |