vine / QUICKSTART.md
ASethi04's picture
Update QUICKSTART.md
076bdbf verified
# VINE Model - Quick Start Guide
Get started with VINE video understanding in 2 simple steps!
## One-Command Setup
```bash
# Download and run the complete setup script
wget https://huggingface.co/video-fm/vine/resolve/main/setup_vine_complete.sh
bash setup_vine_complete.sh
```
**That's it!** This single script:
- βœ… Creates conda environment with Python 3.10
- βœ… Installs PyTorch with CUDA support
- βœ… Clones all required repositories (laser, sam2, groundingdino, vine_hf)
- βœ… Downloads SAM2 checkpoint (~149 MB)
- βœ… Downloads GroundingDINO checkpoint (~662 MB)
- βœ… Downloads all config files
- βœ… Tests the installation
**Total setup time**: ~10-15 minutes (depending on download speed)
## What Gets Installed
```
your-directory/
β”œβ”€β”€ checkpoints/
β”‚ β”œβ”€β”€ sam2_hiera_tiny.pt (~149 MB)
β”‚ β”œβ”€β”€ sam2_hiera_t.yaml
β”‚ β”œβ”€β”€ groundingdino_swint_ogc.pth (~662 MB)
β”‚ └── GroundingDINO_SwinT_OGC.py
β”œβ”€β”€ src/
β”‚ β”œβ”€β”€ LASER/ (video processing utilities)
β”‚ β”œβ”€β”€ video-sam2/ (SAM2 segmentation)
β”‚ β”œβ”€β”€ GroundingDINO/ (object detection)
β”‚ └── vine_hf/ (VINE HuggingFace interface)
└── test_vine.py (test script)
```
## Usage After Setup
### Activate Environment
```bash
conda activate vine_demo
```
### Test Installation
```python
python test_vine.py
```
### Use in Your Code
```python
from transformers import AutoModel
from vine_hf import VinePipeline
from pathlib import Path
# Load VINE model from HuggingFace
model = AutoModel.from_pretrained('video-fm/vine', trust_remote_code=True)
# Set up checkpoint paths
checkpoint_dir = Path("checkpoints")
# Create pipeline
pipeline = VinePipeline(
model=model,
tokenizer=None,
sam_config_path=str(checkpoint_dir / "sam2_hiera_t.yaml"),
sam_checkpoint_path=str(checkpoint_dir / "sam2_hiera_tiny.pt"),
gd_config_path=str(checkpoint_dir / "GroundingDINO_SwinT_OGC.py"),
gd_checkpoint_path=str(checkpoint_dir / "groundingdino_swint_ogc.pth"),
device="cuda",
trust_remote_code=True
)
# Process a video
results = pipeline(
"path/to/video.mp4",
categorical_keywords=['person', 'dog', 'ball'],
unary_keywords=['running', 'jumping', 'sitting'],
binary_keywords=['chasing', 'next to', 'holding'],
object_pairs=[(0, 1), (0, 2)], # person-dog, person-ball
return_top_k=5
)
# Print results
print(f"Detected {results['summary']['num_objects_detected']} objects")
print(f"Top categories: {results['summary']['top_categories']}")
print(f"Top actions: {results['summary']['top_actions']}")
print(f"Top relations: {results['summary']['top_relations']}")
```
## System Requirements
- **OS**: Linux (tested on Ubuntu)
- **Python**: 3.10+
- **CUDA**: 11.8+ (for GPU acceleration)
- **GPU**: 8GB+ VRAM recommended (T4, V100, A100, etc.)
- **RAM**: 16GB+ recommended
- **Disk Space**: ~5GB total
- Conda environment: ~3GB
- Checkpoints: ~811MB
- Code repositories: ~1GB
## Troubleshooting
### CUDA Not Available
```bash
# Check CUDA
nvidia-smi
# If not working, install CPU-only version
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
```
### Download Failed
```bash
# Manually download checkpoints
cd checkpoints
# SAM2
wget https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_tiny.pt
wget https://raw.githubusercontent.com/facebookresearch/sam2/main/sam2/configs/sam2.1/sam2.1_hiera_t.yaml -O sam2_hiera_t.yaml
# GroundingDINO
wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
wget https://raw.githubusercontent.com/IDEA-Research/GroundingDINO/main/groundingdino/config/GroundingDINO_SwinT_OGC.py
```
### Import Errors
```bash
# Reinstall packages
conda activate vine_demo
cd src
pip install -e ./LASER
pip install -e ./video-sam2
pip install -e ./GroundingDINO
pip install -e ./vine_hf
```
## Alternative: Manual Setup
If you prefer to set up manually or the script fails, see [README.md](https://huggingface.co/video-fm/vine) for step-by-step instructions.
## Next Steps
- **Process your videos**: Use the pipeline with your own videos
- **Customize keywords**: Adjust categorical, unary, and binary keywords
- **Visualize results**: Enable `visualize=True` in config
- **Deploy**: Use in HuggingFace Spaces, FastAPI, or your own app
## Links
- **Model**: https://huggingface.co/video-fm/vine
- **Setup Script**: https://huggingface.co/video-fm/vine/blob/main/setup_vine_complete.sh
- **Documentation**: https://huggingface.co/video-fm/vine#readme
- **Code**: https://github.com/kevinxuez/LASER
- **Issues**: https://github.com/kevinxuez/LASER/issues
## Support
- **Setup Issues**: Check the script output for errors
- **Model Issues**: https://huggingface.co/video-fm/vine/discussions
- **Code Issues**: https://github.com/kevinxuez/LASER/issues
---
**Ready to start?**
```bash
wget https://huggingface.co/video-fm/vine/resolve/main/setup_vine_complete.sh
bash setup_vine_complete.sh
```
πŸŽ‰ Happy video understanding with VINE!