vine / QUICKSTART.md

Update QUICKSTART.md

076bdbf verified 3 months ago

5.11 kB

	# VINE Model - Quick Start Guide

	Get started with VINE video understanding in 2 simple steps!

	## One-Command Setup

	```bash
	# Download and run the complete setup script
	wget https://huggingface.co/video-fm/vine/resolve/main/setup_vine_complete.sh
	bash setup_vine_complete.sh
	```

	That's it! This single script:
	- ✅ Creates conda environment with Python 3.10
	- ✅ Installs PyTorch with CUDA support
	- ✅ Clones all required repositories (laser, sam2, groundingdino, vine_hf)
	- ✅ Downloads SAM2 checkpoint (~149 MB)
	- ✅ Downloads GroundingDINO checkpoint (~662 MB)
	- ✅ Downloads all config files
	- ✅ Tests the installation

	Total setup time: ~10-15 minutes (depending on download speed)

	## What Gets Installed

	```
	your-directory/
	├── checkpoints/
	│ ├── sam2_hiera_tiny.pt (~149 MB)
	│ ├── sam2_hiera_t.yaml
	│ ├── groundingdino_swint_ogc.pth (~662 MB)
	│ └── GroundingDINO_SwinT_OGC.py
	├── src/
	│ ├── LASER/ (video processing utilities)
	│ ├── video-sam2/ (SAM2 segmentation)
	│ ├── GroundingDINO/ (object detection)
	│ └── vine_hf/ (VINE HuggingFace interface)
	└── test_vine.py (test script)
	```

	## Usage After Setup

	### Activate Environment
	```bash
	conda activate vine_demo
	```

	### Test Installation
	```python
	python test_vine.py
	```

	### Use in Your Code
	```python
	from transformers import AutoModel
	from vine_hf import VinePipeline
	from pathlib import Path

	# Load VINE model from HuggingFace
	model = AutoModel.from_pretrained('video-fm/vine', trust_remote_code=True)

	# Set up checkpoint paths
	checkpoint_dir = Path("checkpoints")

	# Create pipeline
	pipeline = VinePipeline(
	model=model,
	tokenizer=None,
	sam_config_path=str(checkpoint_dir / "sam2_hiera_t.yaml"),
	sam_checkpoint_path=str(checkpoint_dir / "sam2_hiera_tiny.pt"),
	gd_config_path=str(checkpoint_dir / "GroundingDINO_SwinT_OGC.py"),
	gd_checkpoint_path=str(checkpoint_dir / "groundingdino_swint_ogc.pth"),
	device="cuda",
	trust_remote_code=True
	)

	# Process a video
	results = pipeline(
	"path/to/video.mp4",
	categorical_keywords=['person', 'dog', 'ball'],
	unary_keywords=['running', 'jumping', 'sitting'],
	binary_keywords=['chasing', 'next to', 'holding'],
	object_pairs=[(0, 1), (0, 2)], # person-dog, person-ball
	return_top_k=5
	)

	# Print results
	print(f"Detected {results['summary']['num_objects_detected']} objects")
	print(f"Top categories: {results['summary']['top_categories']}")
	print(f"Top actions: {results['summary']['top_actions']}")
	print(f"Top relations: {results['summary']['top_relations']}")
	```

	## System Requirements

	- OS: Linux (tested on Ubuntu)
	- Python: 3.10+
	- CUDA: 11.8+ (for GPU acceleration)
	- GPU: 8GB+ VRAM recommended (T4, V100, A100, etc.)
	- RAM: 16GB+ recommended
	- Disk Space: ~5GB total
	- Conda environment: ~3GB
	- Checkpoints: ~811MB
	- Code repositories: ~1GB

	## Troubleshooting

	### CUDA Not Available
	```bash
	# Check CUDA
	nvidia-smi

	# If not working, install CPU-only version
	pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
	```

	### Download Failed
	```bash
	# Manually download checkpoints
	cd checkpoints

	# SAM2
	wget https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_tiny.pt
	wget https://raw.githubusercontent.com/facebookresearch/sam2/main/sam2/configs/sam2.1/sam2.1_hiera_t.yaml -O sam2_hiera_t.yaml

	# GroundingDINO
	wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
	wget https://raw.githubusercontent.com/IDEA-Research/GroundingDINO/main/groundingdino/config/GroundingDINO_SwinT_OGC.py
	```

	### Import Errors
	```bash
	# Reinstall packages
	conda activate vine_demo
	cd src
	pip install -e ./LASER
	pip install -e ./video-sam2
	pip install -e ./GroundingDINO
	pip install -e ./vine_hf
	```

	## Alternative: Manual Setup

	If you prefer to set up manually or the script fails, see [README.md](https://huggingface.co/video-fm/vine) for step-by-step instructions.

	## Next Steps

	- Process your videos: Use the pipeline with your own videos
	- Customize keywords: Adjust categorical, unary, and binary keywords
	- Visualize results: Enable `visualize=True` in config
	- Deploy: Use in HuggingFace Spaces, FastAPI, or your own app

	## Links

	- Model: https://huggingface.co/video-fm/vine
	- Setup Script: https://huggingface.co/video-fm/vine/blob/main/setup_vine_complete.sh
	- Documentation: https://huggingface.co/video-fm/vine#readme
	- Code: https://github.com/kevinxuez/LASER
	- Issues: https://github.com/kevinxuez/LASER/issues

	## Support

	- Setup Issues: Check the script output for errors
	- Model Issues: https://huggingface.co/video-fm/vine/discussions
	- Code Issues: https://github.com/kevinxuez/LASER/issues

	---

	Ready to start?

	```bash
	wget https://huggingface.co/video-fm/vine/resolve/main/setup_vine_complete.sh
	bash setup_vine_complete.sh
	```

	🎉 Happy video understanding with VINE!