# VINE Model - Quick Start Guide Get started with VINE video understanding in 2 simple steps! ## One-Command Setup ```bash # Download and run the complete setup script wget https://huggingface.co/video-fm/vine/resolve/main/setup_vine_complete.sh bash setup_vine_complete.sh ``` **That's it!** This single script: - ✅ Creates conda environment with Python 3.10 - ✅ Installs PyTorch with CUDA support - ✅ Clones all required repositories (laser, sam2, groundingdino, vine_hf) - ✅ Downloads SAM2 checkpoint (~149 MB) - ✅ Downloads GroundingDINO checkpoint (~662 MB) - ✅ Downloads all config files - ✅ Tests the installation **Total setup time**: ~10-15 minutes (depending on download speed) ## What Gets Installed ``` your-directory/ ├── checkpoints/ │ ├── sam2_hiera_tiny.pt (~149 MB) │ ├── sam2_hiera_t.yaml │ ├── groundingdino_swint_ogc.pth (~662 MB) │ └── GroundingDINO_SwinT_OGC.py ├── src/ │ ├── LASER/ (video processing utilities) │ ├── video-sam2/ (SAM2 segmentation) │ ├── GroundingDINO/ (object detection) │ └── vine_hf/ (VINE HuggingFace interface) └── test_vine.py (test script) ``` ## Usage After Setup ### Activate Environment ```bash conda activate vine_demo ``` ### Test Installation ```python python test_vine.py ``` ### Use in Your Code ```python from transformers import AutoModel from vine_hf import VinePipeline from pathlib import Path # Load VINE model from HuggingFace model = AutoModel.from_pretrained('video-fm/vine', trust_remote_code=True) # Set up checkpoint paths checkpoint_dir = Path("checkpoints") # Create pipeline pipeline = VinePipeline( model=model, tokenizer=None, sam_config_path=str(checkpoint_dir / "sam2_hiera_t.yaml"), sam_checkpoint_path=str(checkpoint_dir / "sam2_hiera_tiny.pt"), gd_config_path=str(checkpoint_dir / "GroundingDINO_SwinT_OGC.py"), gd_checkpoint_path=str(checkpoint_dir / "groundingdino_swint_ogc.pth"), device="cuda", trust_remote_code=True ) # Process a video results = pipeline( "path/to/video.mp4", categorical_keywords=['person', 'dog', 'ball'], unary_keywords=['running', 'jumping', 'sitting'], binary_keywords=['chasing', 'next to', 'holding'], object_pairs=[(0, 1), (0, 2)], # person-dog, person-ball return_top_k=5 ) # Print results print(f"Detected {results['summary']['num_objects_detected']} objects") print(f"Top categories: {results['summary']['top_categories']}") print(f"Top actions: {results['summary']['top_actions']}") print(f"Top relations: {results['summary']['top_relations']}") ``` ## System Requirements - **OS**: Linux (tested on Ubuntu) - **Python**: 3.10+ - **CUDA**: 11.8+ (for GPU acceleration) - **GPU**: 8GB+ VRAM recommended (T4, V100, A100, etc.) - **RAM**: 16GB+ recommended - **Disk Space**: ~5GB total - Conda environment: ~3GB - Checkpoints: ~811MB - Code repositories: ~1GB ## Troubleshooting ### CUDA Not Available ```bash # Check CUDA nvidia-smi # If not working, install CPU-only version pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu ``` ### Download Failed ```bash # Manually download checkpoints cd checkpoints # SAM2 wget https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_tiny.pt wget https://raw.githubusercontent.com/facebookresearch/sam2/main/sam2/configs/sam2.1/sam2.1_hiera_t.yaml -O sam2_hiera_t.yaml # GroundingDINO wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth wget https://raw.githubusercontent.com/IDEA-Research/GroundingDINO/main/groundingdino/config/GroundingDINO_SwinT_OGC.py ``` ### Import Errors ```bash # Reinstall packages conda activate vine_demo cd src pip install -e ./LASER pip install -e ./video-sam2 pip install -e ./GroundingDINO pip install -e ./vine_hf ``` ## Alternative: Manual Setup If you prefer to set up manually or the script fails, see [README.md](https://huggingface.co/video-fm/vine) for step-by-step instructions. ## Next Steps - **Process your videos**: Use the pipeline with your own videos - **Customize keywords**: Adjust categorical, unary, and binary keywords - **Visualize results**: Enable `visualize=True` in config - **Deploy**: Use in HuggingFace Spaces, FastAPI, or your own app ## Links - **Model**: https://huggingface.co/video-fm/vine - **Setup Script**: https://huggingface.co/video-fm/vine/blob/main/setup_vine_complete.sh - **Documentation**: https://huggingface.co/video-fm/vine#readme - **Code**: https://github.com/kevinxuez/LASER - **Issues**: https://github.com/kevinxuez/LASER/issues ## Support - **Setup Issues**: Check the script output for errors - **Model Issues**: https://huggingface.co/video-fm/vine/discussions - **Code Issues**: https://github.com/kevinxuez/LASER/issues --- **Ready to start?** ```bash wget https://huggingface.co/video-fm/vine/resolve/main/setup_vine_complete.sh bash setup_vine_complete.sh ``` 🎉 Happy video understanding with VINE!