ASethi04 commited on
Commit
ab7c3b2
·
verified ·
1 Parent(s): 2f770ac

Update README with one-command setup instructions

Browse files
Files changed (1) hide show
  1. README.md +242 -209
README.md CHANGED
@@ -5,185 +5,155 @@
5
 
6
  VINE is a video understanding model that processes videos along with categorical, unary, and binary keywords to return probability distributions over those keywords for detected objects and their relationships.
7
 
8
- ## Quick Start
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
 
10
  ```python
11
  from transformers import AutoModel
12
- from vine_hf import VineConfig, VineModel, VinePipeline
 
13
 
14
- # Load VINE model from HuggingFace
15
  model = AutoModel.from_pretrained('video-fm/vine', trust_remote_code=True)
16
 
17
- # Create pipeline with your checkpoint paths
18
- vine_pipeline = VinePipeline(
 
19
  model=model,
20
  tokenizer=None,
21
- sam_config_path="/path/to/sam2_config.yaml",
22
- sam_checkpoint_path="/path/to/sam2_checkpoint.pt",
23
- gd_config_path="/path/to/grounding_dino_config.py",
24
- gd_checkpoint_path="/path/to/grounding_dino_checkpoint.pth",
25
  device="cuda",
26
  trust_remote_code=True
27
  )
28
 
29
- # Process a video
30
- results = vine_pipeline(
31
- 'path/to/video.mp4',
32
- categorical_keywords=['human', 'dog', 'frisbee'],
33
  unary_keywords=['running', 'jumping'],
34
- binary_keywords=['chasing', 'behind'],
35
- return_top_k=3
36
  )
 
 
37
  ```
38
 
39
- ## Installation
40
 
41
- ### Option 1: Automated Setup (Recommended)
 
 
 
 
42
 
43
- ```bash
44
- # Download the setup script
45
- wget https://raw.githubusercontent.com/kevinxuez/vine_hf/main/setup_vine_demo.sh
46
 
47
- # Run the setup
48
- bash setup_vine_demo.sh
49
 
50
- # Activate environment
51
- conda activate vine_demo
52
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
 
54
- ### Option 2: Manual Installation
55
 
56
  ```bash
57
- # 1. Create conda environment
58
  conda create -n vine_demo python=3.10 -y
59
  conda activate vine_demo
60
-
61
- # 2. Install PyTorch with CUDA support
62
  pip install torch==2.7.1 torchvision==0.22.1 --index-url https://download.pytorch.org/whl/cu126
 
 
 
63
 
64
- # 3. Install core dependencies
65
- pip install transformers huggingface-hub safetensors
 
66
 
67
- # 4. Clone and install required repositories
 
 
68
  git clone https://github.com/video-fm/video-sam2.git
69
  git clone https://github.com/video-fm/GroundingDINO.git
70
  git clone https://github.com/kevinxuez/LASER.git
71
  git clone https://github.com/kevinxuez/vine_hf.git
72
 
73
- # Install in editable mode
74
  pip install -e ./video-sam2
75
  pip install -e ./GroundingDINO
76
  pip install -e ./LASER
77
  pip install -e ./vine_hf
78
 
79
- # Build GroundingDINO extensions
80
- cd GroundingDINO && python setup.py build_ext --force --inplace && cd ..
81
  ```
82
 
83
- ## Required Checkpoints
84
-
85
- VINE requires SAM2 and GroundingDINO checkpoints for segmentation. Download these separately:
86
 
87
- ### SAM2 Checkpoint
88
  ```bash
 
 
 
89
  wget https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_tiny.pt
90
- wget https://raw.githubusercontent.com/facebookresearch/sam2/main/sam2/configs/sam2.1/sam2.1_hiera_t.yaml
91
- ```
92
 
93
- ### GroundingDINO Checkpoint
94
- ```bash
95
  wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
96
  wget https://raw.githubusercontent.com/IDEA-Research/GroundingDINO/main/groundingdino/config/GroundingDINO_SwinT_OGC.py
97
  ```
98
 
99
- ## Architecture
100
-
101
- ```
102
- video-fm/vine (HuggingFace Hub)
103
- ├── VINE Model Weights (~1.8GB)
104
- │ ├── Categorical CLIP model (fine-tuned)
105
- │ ├── Unary CLIP model (fine-tuned)
106
- │ └── Binary CLIP model (fine-tuned)
107
- └── Architecture Files
108
- ├── vine_config.py
109
- ├── vine_model.py
110
- ├── vine_pipeline.py
111
- └── utilities
112
-
113
- User Provides:
114
- ├── Dependencies (via pip/conda)
115
- │ ├── laser (video processing utilities)
116
- │ ├── sam2 (segmentation)
117
- │ └── groundingdino (object detection)
118
- └── Checkpoints (downloaded separately)
119
- ├── SAM2 model files
120
- └── GroundingDINO model files
121
- ```
122
-
123
- ## Why This Architecture?
124
-
125
- This separation of concerns provides several benefits:
126
-
127
- 1. **Lightweight Distribution**: Only VINE-specific weights (~1.8GB) are on HuggingFace
128
- 2. **Version Control**: Users can choose their preferred SAM2/GroundingDINO versions
129
- 3. **Licensing**: Keeps different model licenses separate
130
- 4. **Flexibility**: Easy to swap segmentation backends
131
- 5. **Standard Practice**: Similar to models like LLaVA, BLIP-2, etc.
132
-
133
- ## Full Usage Example
134
-
135
- ```python
136
- import os
137
- from pathlib import Path
138
- from transformers import AutoModel
139
- from vine_hf import VinePipeline
140
-
141
- # Set up paths
142
- checkpoint_dir = Path("/path/to/checkpoints")
143
- sam_config = checkpoint_dir / "sam2_hiera_t.yaml"
144
- sam_checkpoint = checkpoint_dir / "sam2_hiera_tiny.pt"
145
- gd_config = checkpoint_dir / "GroundingDINO_SwinT_OGC.py"
146
- gd_checkpoint = checkpoint_dir / "groundingdino_swint_ogc.pth"
147
-
148
- # Load VINE from HuggingFace
149
- model = AutoModel.from_pretrained('video-fm/vine', trust_remote_code=True)
150
-
151
- # Create pipeline
152
- vine_pipeline = VinePipeline(
153
- model=model,
154
- tokenizer=None,
155
- sam_config_path=str(sam_config),
156
- sam_checkpoint_path=str(sam_checkpoint),
157
- gd_config_path=str(gd_config),
158
- gd_checkpoint_path=str(gd_checkpoint),
159
- device="cuda:0",
160
- trust_remote_code=True
161
- )
162
-
163
- # Process video
164
- results = vine_pipeline(
165
- "path/to/video.mp4",
166
- categorical_keywords=['person', 'dog', 'ball'],
167
- unary_keywords=['running', 'jumping', 'sitting'],
168
- binary_keywords=['chasing', 'next to', 'holding'],
169
- object_pairs=[(0, 1), (0, 2)], # person-dog, person-ball
170
- return_top_k=5,
171
- include_visualizations=True
172
- )
173
-
174
- # Access results
175
- print(f"Detected {results['summary']['num_objects_detected']} objects")
176
- print(f"Top categories: {results['summary']['top_categories']}")
177
- print(f"Top actions: {results['summary']['top_actions']}")
178
- print(f"Top relations: {results['summary']['top_relations']}")
179
-
180
- # Access detailed predictions
181
- for obj_id, predictions in results['categorical_predictions'].items():
182
- print(f"\nObject {obj_id}:")
183
- for prob, category in predictions:
184
- print(f" {category}: {prob:.3f}")
185
- ```
186
-
187
  ## Output Format
188
 
189
  ```python
@@ -197,27 +167,55 @@ for obj_id, predictions in results['categorical_predictions'].items():
197
  "binary_predictions": {
198
  (frame_id, (obj1_id, obj2_id)): [(probability, relation), ...]
199
  },
200
- "confidence_scores": {
201
- "categorical": float,
202
- "unary": float,
203
- "binary": float
204
- },
205
  "summary": {
206
  "num_objects_detected": int,
207
  "top_categories": [(category, probability), ...],
208
  "top_actions": [(action, probability), ...],
209
  "top_relations": [(relation, probability), ...]
210
- },
211
- "visualizations": { # if include_visualizations=True
212
- "vine": {
213
- "all": {"frames": [...], "video_path": "..."},
214
- ...
215
- }
216
  }
217
  }
218
  ```
219
 
220
- ## Configuration Options
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
221
 
222
  ```python
223
  from vine_hf import VineConfig
@@ -225,94 +223,127 @@ from vine_hf import VineConfig
225
  config = VineConfig(
226
  model_name="openai/clip-vit-base-patch32", # CLIP backbone
227
  segmentation_method="grounding_dino_sam2", # or "sam2"
228
- box_threshold=0.35, # GroundingDINO threshold
229
- text_threshold=0.25, # GroundingDINO threshold
230
  target_fps=5, # Video sampling rate
231
  visualize=True, # Enable visualizations
232
  visualization_dir="outputs/", # Output directory
233
- debug_visualizations=False, # Debug mode
234
  device="cuda:0" # Device
235
  )
236
  ```
237
 
238
- ## Deployment Examples
 
 
 
 
 
 
 
 
 
 
 
239
 
240
- ### Local Script
241
  ```python
242
- # test_vine.py
243
- from transformers import AutoModel
244
- from vine_hf import VinePipeline
245
 
246
- model = AutoModel.from_pretrained('video-fm/vine', trust_remote_code=True)
247
- pipeline = VinePipeline(model=model, ...)
248
- results = pipeline("video.mp4", ...)
 
 
249
  ```
250
 
251
- ### HuggingFace Spaces
252
- ```python
253
- # app.py for Gradio Space
254
- import gradio as gr
255
- from transformers import AutoModel
256
- from vine_hf import VinePipeline
257
 
258
- model = AutoModel.from_pretrained('video-fm/vine', trust_remote_code=True)
259
- # ... set up pipeline and Gradio interface
 
260
  ```
261
 
262
- ### API Server
263
- ```python
264
- # FastAPI server
265
- from fastapi import FastAPI
266
- from transformers import AutoModel
267
- from vine_hf import VinePipeline
268
 
269
- app = FastAPI()
270
- model = AutoModel.from_pretrained('video-fm/vine', trust_remote_code=True)
271
- pipeline = VinePipeline(model=model, ...)
272
 
273
- @app.post("/process")
274
- async def process_video(video_path: str):
275
- return pipeline(video_path, ...)
 
 
 
 
276
  ```
277
 
278
- ## Troubleshooting
279
 
280
- ### Import Errors
281
- ```bash
282
- # Make sure all dependencies are installed
283
- pip list | grep -E "laser|sam2|groundingdino"
 
 
 
 
284
 
285
- # Reinstall if needed
286
- pip install -e ./LASER
287
- pip install -e ./video-sam2
288
- pip install -e ./GroundingDINO
 
 
 
 
 
289
  ```
290
 
291
- ### CUDA Errors
 
 
 
292
  ```python
293
- # Check CUDA availability
294
- import torch
295
- print(torch.cuda.is_available())
296
- print(torch.version.cuda)
297
 
298
- # Use CPU if needed
299
- pipeline = VinePipeline(model=model, device="cpu", ...)
 
 
 
 
 
 
 
 
300
  ```
301
 
302
- ### Checkpoint Not Found
303
- ```bash
304
- # Verify checkpoint paths
305
- ls -lh /path/to/sam2_hiera_tiny.pt
306
- ls -lh /path/to/groundingdino_swint_ogc.pth
 
 
 
 
 
 
 
307
  ```
308
 
309
- ## System Requirements
310
 
311
- - **Python**: 3.10+
312
- - **CUDA**: 11.8+ (for GPU)
313
- - **GPU**: 8GB+ VRAM recommended (T4, V100, A100, etc.)
314
- - **RAM**: 16GB+ recommended
315
- - **Storage**: ~3GB for checkpoints
 
 
 
316
 
317
  ## Citation
318
 
@@ -327,19 +358,21 @@ ls -lh /path/to/groundingdino_swint_ogc.pth
327
 
328
  ## License
329
 
330
- This model and code are released under the MIT License. Note that SAM2 and GroundingDINO have their own respective licenses.
331
 
332
  ## Links
333
 
334
  - **Model**: https://huggingface.co/video-fm/vine
335
- - **Code**: https://github.com/kevinxuez/LASER
336
- - **vine_hf Package**: https://github.com/kevinxuez/vine_hf
337
- - **SAM2**: https://github.com/facebookresearch/sam2
338
- - **GroundingDINO**: https://github.com/IDEA-Research/GroundingDINO
339
 
340
  ## Support
341
 
342
- For issues or questions:
343
- - **Model/Architecture**: [HuggingFace Discussions](https://huggingface.co/video-fm/vine/discussions)
344
- - **LASER Framework**: [GitHub Issues](https://github.com/kevinxuez/LASER/issues)
345
- - **vine_hf Package**: [GitHub Issues](https://github.com/kevinxuez/vine_hf/issues)
 
 
 
5
 
6
  VINE is a video understanding model that processes videos along with categorical, unary, and binary keywords to return probability distributions over those keywords for detected objects and their relationships.
7
 
8
+ ## 🚀 One-Command Setup
9
+
10
+ ```bash
11
+ wget https://huggingface.co/video-fm/vine/resolve/main/setup_vine_complete.sh
12
+ bash setup_vine_complete.sh
13
+ ```
14
+
15
+ **That's it!** This single script installs everything you need:
16
+ - ✅ Python environment with all dependencies
17
+ - ✅ SAM2 and GroundingDINO packages
18
+ - ✅ All model checkpoints (~800 MB)
19
+ - ✅ VINE model from HuggingFace (~1.8 GB)
20
+
21
+ **Total time**: 10-15 minutes | **Total size**: ~2.6 GB
22
+
23
+ See [QUICKSTART.md](QUICKSTART.md) for detailed instructions.
24
+
25
+ ## Quick Example
26
 
27
  ```python
28
  from transformers import AutoModel
29
+ from vine_hf import VinePipeline
30
+ from pathlib import Path
31
 
32
+ # Load VINE from HuggingFace
33
  model = AutoModel.from_pretrained('video-fm/vine', trust_remote_code=True)
34
 
35
+ # Create pipeline (checkpoints downloaded by setup script)
36
+ checkpoint_dir = Path("checkpoints")
37
+ pipeline = VinePipeline(
38
  model=model,
39
  tokenizer=None,
40
+ sam_config_path=str(checkpoint_dir / "sam2_hiera_t.yaml"),
41
+ sam_checkpoint_path=str(checkpoint_dir / "sam2_hiera_tiny.pt"),
42
+ gd_config_path=str(checkpoint_dir / "GroundingDINO_SwinT_OGC.py"),
43
+ gd_checkpoint_path=str(checkpoint_dir / "groundingdino_swint_ogc.pth"),
44
  device="cuda",
45
  trust_remote_code=True
46
  )
47
 
48
+ # Process video
49
+ results = pipeline(
50
+ 'video.mp4',
51
+ categorical_keywords=['person', 'dog', 'ball'],
52
  unary_keywords=['running', 'jumping'],
53
+ binary_keywords=['chasing', 'next to'],
54
+ return_top_k=5
55
  )
56
+
57
+ print(results['summary'])
58
  ```
59
 
60
+ ## Features
61
 
62
+ - **Categorical Classification**: Classify objects in videos (e.g., "human", "dog", "frisbee")
63
+ - **Unary Predicates**: Detect actions on single objects (e.g., "running", "jumping", "sitting")
64
+ - **Binary Relations**: Detect relationships between object pairs (e.g., "behind", "chasing")
65
+ - **Multi-Modal**: Combines vision (CLIP) with text-based segmentation (GroundingDINO + SAM2)
66
+ - **Visualizations**: Optional annotated video outputs
67
 
68
+ ## Architecture
 
 
69
 
70
+ VINE uses a modular architecture:
 
71
 
 
 
72
  ```
73
+ HuggingFace Hub (video-fm/vine)
74
+ ├── VINE model weights (~1.8 GB)
75
+ │ ├── Categorical CLIP (object classification)
76
+ │ ├── Unary CLIP (single-object actions)
77
+ │ └── Binary CLIP (object relationships)
78
+ └── Architecture files
79
+
80
+ User Environment (via setup script)
81
+ ├── Dependencies: laser, sam2, groundingdino
82
+ └── Checkpoints: SAM2 (~149 MB), GroundingDINO (~662 MB)
83
+ ```
84
+
85
+ This separation allows:
86
+ - ✅ Lightweight model distribution
87
+ - ✅ User control over checkpoint versions
88
+ - ✅ Flexible deployment options
89
+ - ✅ Standard HuggingFace practices
90
+
91
+ ## What the Setup Script Does
92
+
93
+ ```bash
94
+ # 1. Creates conda environment (vine_demo)
95
+ # 2. Installs PyTorch with CUDA
96
+ # 3. Clones repositories:
97
+ # - video-sam2 (SAM2 package)
98
+ # - GroundingDINO (object detection)
99
+ # - LASER (video utilities)
100
+ # - vine_hf (VINE interface)
101
+ # 4. Installs packages in editable mode
102
+ # 5. Downloads model checkpoints:
103
+ # - sam2_hiera_tiny.pt (~149 MB)
104
+ # - groundingdino_swint_ogc.pth (~662 MB)
105
+ # - Config files
106
+ # 6. Tests the installation
107
+ ```
108
+
109
+ ## Manual Installation
110
+
111
+ If you prefer manual installation or need to customize:
112
 
113
+ ### 1. Create Environment
114
 
115
  ```bash
 
116
  conda create -n vine_demo python=3.10 -y
117
  conda activate vine_demo
 
 
118
  pip install torch==2.7.1 torchvision==0.22.1 --index-url https://download.pytorch.org/whl/cu126
119
+ ```
120
+
121
+ ### 2. Install Dependencies
122
 
123
+ ```bash
124
+ pip install transformers huggingface-hub safetensors opencv-python pillow
125
+ ```
126
 
127
+ ### 3. Clone and Install Packages
128
+
129
+ ```bash
130
  git clone https://github.com/video-fm/video-sam2.git
131
  git clone https://github.com/video-fm/GroundingDINO.git
132
  git clone https://github.com/kevinxuez/LASER.git
133
  git clone https://github.com/kevinxuez/vine_hf.git
134
 
 
135
  pip install -e ./video-sam2
136
  pip install -e ./GroundingDINO
137
  pip install -e ./LASER
138
  pip install -e ./vine_hf
139
 
140
+ cd GroundingDINO && python setup.py build_ext --inplace && cd ..
 
141
  ```
142
 
143
+ ### 4. Download Checkpoints
 
 
144
 
 
145
  ```bash
146
+ mkdir checkpoints && cd checkpoints
147
+
148
+ # SAM2
149
  wget https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_tiny.pt
150
+ wget https://raw.githubusercontent.com/facebookresearch/sam2/main/sam2/configs/sam2.1/sam2.1_hiera_t.yaml -O sam2_hiera_t.yaml
 
151
 
152
+ # GroundingDINO
 
153
  wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
154
  wget https://raw.githubusercontent.com/IDEA-Research/GroundingDINO/main/groundingdino/config/GroundingDINO_SwinT_OGC.py
155
  ```
156
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
157
  ## Output Format
158
 
159
  ```python
 
167
  "binary_predictions": {
168
  (frame_id, (obj1_id, obj2_id)): [(probability, relation), ...]
169
  },
 
 
 
 
 
170
  "summary": {
171
  "num_objects_detected": int,
172
  "top_categories": [(category, probability), ...],
173
  "top_actions": [(action, probability), ...],
174
  "top_relations": [(relation, probability), ...]
 
 
 
 
 
 
175
  }
176
  }
177
  ```
178
 
179
+ ## Advanced Usage
180
+
181
+ ### Custom Segmentation
182
+
183
+ ```python
184
+ # Use your own masks and bounding boxes
185
+ results = model.predict(
186
+ video_frames=frames,
187
+ masks=your_masks,
188
+ bboxes=your_bboxes,
189
+ categorical_keywords=['person', 'dog'],
190
+ unary_keywords=['running'],
191
+ binary_keywords=['chasing']
192
+ )
193
+ ```
194
+
195
+ ### SAM2 Only (No GroundingDINO)
196
+
197
+ ```python
198
+ config = VineConfig(
199
+ segmentation_method="sam2", # Uses SAM2 automatic mask generation
200
+ ...
201
+ )
202
+ ```
203
+
204
+ ### Enable Visualizations
205
+
206
+ ```python
207
+ results = pipeline(
208
+ 'video.mp4',
209
+ categorical_keywords=['person', 'dog'],
210
+ include_visualizations=True, # Creates annotated video
211
+ return_top_k=5
212
+ )
213
+
214
+ # Access annotated video
215
+ video_path = results['visualizations']['vine']['all']['video_path']
216
+ ```
217
+
218
+ ## Configuration
219
 
220
  ```python
221
  from vine_hf import VineConfig
 
223
  config = VineConfig(
224
  model_name="openai/clip-vit-base-patch32", # CLIP backbone
225
  segmentation_method="grounding_dino_sam2", # or "sam2"
226
+ box_threshold=0.35, # Detection threshold
227
+ text_threshold=0.25, # Text matching threshold
228
  target_fps=5, # Video sampling rate
229
  visualize=True, # Enable visualizations
230
  visualization_dir="outputs/", # Output directory
 
231
  device="cuda:0" # Device
232
  )
233
  ```
234
 
235
+ ## System Requirements
236
+
237
+ - **OS**: Linux (Ubuntu 20.04+)
238
+ - **Python**: 3.10+
239
+ - **CUDA**: 11.8+ (for GPU)
240
+ - **GPU**: 8GB+ VRAM (T4, V100, A100)
241
+ - **RAM**: 16GB+
242
+ - **Disk**: ~5GB free
243
+
244
+ ## Troubleshooting
245
+
246
+ ### CUDA Not Available
247
 
 
248
  ```python
249
+ import torch
250
+ print(torch.cuda.is_available()) # Should be True
251
+ ```
252
 
253
+ ### Import Errors
254
+
255
+ ```bash
256
+ conda activate vine_demo
257
+ pip list | grep -E "laser|sam2|groundingdino"
258
  ```
259
 
260
+ ### Checkpoint Not Found
 
 
 
 
 
261
 
262
+ ```bash
263
+ ls -lh checkpoints/
264
+ # Should show: sam2_hiera_tiny.pt, groundingdino_swint_ogc.pth
265
  ```
266
 
267
+ See [QUICKSTART.md](QUICKSTART.md) for detailed troubleshooting.
 
 
 
 
 
268
 
269
+ ## Example Applications
270
+
271
+ ### Sports Analysis
272
 
273
+ ```python
274
+ results = pipeline(
275
+ 'soccer_game.mp4',
276
+ categorical_keywords=['player', 'ball', 'referee'],
277
+ unary_keywords=['running', 'kicking', 'jumping'],
278
+ binary_keywords=['passing', 'tackling', 'defending']
279
+ )
280
  ```
281
 
282
+ ### Surveillance
283
 
284
+ ```python
285
+ results = pipeline(
286
+ 'security_feed.mp4',
287
+ categorical_keywords=['person', 'vehicle', 'bag'],
288
+ unary_keywords=['walking', 'running', 'standing'],
289
+ binary_keywords=['approaching', 'following', 'carrying']
290
+ )
291
+ ```
292
 
293
+ ### Animal Behavior
294
+
295
+ ```python
296
+ results = pipeline(
297
+ 'wildlife.mp4',
298
+ categorical_keywords=['lion', 'zebra', 'elephant'],
299
+ unary_keywords=['eating', 'walking', 'resting'],
300
+ binary_keywords=['hunting', 'fleeing', 'protecting']
301
+ )
302
  ```
303
 
304
+ ## Deployment
305
+
306
+ ### Gradio Demo
307
+
308
  ```python
309
+ import gradio as gr
 
 
 
310
 
311
+ def analyze_video(video, categories, actions, relations):
312
+ results = pipeline(
313
+ video,
314
+ categorical_keywords=categories.split(','),
315
+ unary_keywords=actions.split(','),
316
+ binary_keywords=relations.split(',')
317
+ )
318
+ return results['summary']
319
+
320
+ gr.Interface(analyze_video, ...).launch()
321
  ```
322
 
323
+ ### FastAPI Server
324
+
325
+ ```python
326
+ from fastapi import FastAPI
327
+
328
+ app = FastAPI()
329
+ model = AutoModel.from_pretrained('video-fm/vine', trust_remote_code=True)
330
+ pipeline = VinePipeline(model=model, ...)
331
+
332
+ @app.post("/analyze")
333
+ async def analyze(video_path: str, keywords: dict):
334
+ return pipeline(video_path, **keywords)
335
  ```
336
 
337
+ ## Files in This Repository
338
 
339
+ - `setup_vine_complete.sh` - One-command setup script
340
+ - `QUICKSTART.md` - Quick start guide
341
+ - `README.md` - This file (complete documentation)
342
+ - `vine_config.py` - VineConfig class
343
+ - `vine_model.py` - VineModel class
344
+ - `vine_pipeline.py` - VinePipeline class
345
+ - `flattening.py` - Segment processing utilities
346
+ - `vis_utils.py` - Visualization utilities
347
 
348
  ## Citation
349
 
 
358
 
359
  ## License
360
 
361
+ This model is released under the MIT License. Note that SAM2 and GroundingDINO have their own respective licenses.
362
 
363
  ## Links
364
 
365
  - **Model**: https://huggingface.co/video-fm/vine
366
+ - **Quick Start**: [QUICKSTART.md](QUICKSTART.md)
367
+ - **Setup Script**: [setup_vine_complete.sh](setup_vine_complete.sh)
368
+ - **LASER GitHub**: https://github.com/kevinxuez/LASER
369
+ - **Issues**: https://github.com/kevinxuez/LASER/issues
370
 
371
  ## Support
372
 
373
+ - **Questions**: [HuggingFace Discussions](https://huggingface.co/video-fm/vine/discussions)
374
+ - **Bugs**: [GitHub Issues](https://github.com/kevinxuez/LASER/issues)
375
+
376
+ ---
377
+
378
+ **Made with ❤️ by the LASER team**