VEFX-Reward commited on
Commit
d0216b0
Β·
verified Β·
1 Parent(s): fee27b1

Initialize Space README with frontmatter

Browse files
Files changed (1) hide show
  1. README.md +242 -5
README.md CHANGED
@@ -1,10 +1,247 @@
1
  ---
2
- title: VEFX Code
3
- emoji: πŸš€
4
- colorFrom: gray
5
- colorTo: red
6
  sdk: static
7
  pinned: false
 
 
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: VEFX-Code
3
+ emoji: 🎬
4
+ colorFrom: indigo
5
+ colorTo: pink
6
  sdk: static
7
  pinned: false
8
+ license: apache-2.0
9
+ short_description: VEFX-Bench reference code & inference utils
10
  ---
11
 
12
+ <div align="center">
13
+
14
+ # VEFX-Bench
15
+
16
+ ### Benchmarking Generic Video Editing and Visual Effects
17
+
18
+
19
+ **VEFX-Bench** is a comprehensive benchmark for evaluating text-driven video editing and visual effects. It includes **5,049 annotated examples** spanning **9 categories** and **32 subcategories**, evaluated by **VEFX-Reward** β€” a VLM-based reward model that scores edits across three dimensions on a 1–4 scale:
20
+
21
+ | Dimension | What it measures |
22
+ |---|---|
23
+ | **Instructional Following (IF)** | Does the edit accurately reflect the editing instruction? |
24
+ | **Render Quality (RQ)** | Visual clarity, temporal consistency, and physical plausibility |
25
+ | **Edit Exclusivity (EE)** | Were only the intended regions modified, without side-effects? |
26
+
27
+ ---
28
+
29
+ ## πŸ† Model Leaderboard
30
+
31
+ VEFX-Reward scores on 1–4 scale. Ranked by **GeoAgg** (Ξ±=2 for IF, Ξ²=1 for RQ, Ξ³=1 for EE). Higher is better.
32
+
33
+ > **πŸ“… Updated: May 2, 2026** β€” For the latest results & submissions, visit the **[live leaderboard β†’](https://vefx-leaderboard.com/)**
34
+
35
+ | Rank | Model | Type | IF ↑ | RQ ↑ | EE ↑ | GeoAgg ↑ |
36
+ |:---:|---|---|:---:|:---:|:---:|:---:|
37
+ | πŸ₯‡ | **Kling o3 Omni** | Commercial | 3.033 | **3.588** | 3.043 | **3.057** |
38
+ | πŸ₯ˆ | **Kling o1** | Commercial | **3.040** | 3.534 | 2.976 | 2.985 |
39
+ | πŸ₯‰ | **Runway Gen-4.5** | Commercial | 2.817 | 3.319 | 2.923 | 2.912 |
40
+ | 4 | Seedance 2.0 | Commercial | 2.811 | 3.421 | 3.088 | 2.766 |
41
+ | 5 | Grok Imagine | Commercial | 2.606 | 3.346 | **3.376** | 2.723 |
42
+ | 6 | Luma Ray 3 | Commercial | 2.702 | 3.403 | 2.705 | 2.717 |
43
+ | 7 | UniVideo | Open-source | 2.294 | 3.266 | 3.091 | 2.516 |
44
+ | 8 | Wan 2.6 | Commercial | 2.012 | 3.317 | 2.446 | 2.146 |
45
+ | 9 | Luma Ray 2 | Commercial | 2.038 | 2.532 | 1.363 | 1.804 |
46
+ | 10 | VACE | Open-source | 2.027 | 3.172 | 1.180 | 1.775 |
47
+
48
+ ---
49
+
50
+ ## 🎬 Demo Videos
51
+
52
+ Each demo shows the **original video** (left) alongside the **edited video** (right).
53
+
54
+ <table>
55
+ <tr>
56
+ <td align="center"><b>Attribute Change</b><br><sub>"Change the color of the red industrial trailer to a bright yellow while maintaining the texture and appearance of the metal surface."</sub></td>
57
+ <td align="center"><b>Object Removal</b><br><sub>"Remove the woman with the grey backpack walking on the right side of the frame."</sub></td>
58
+ </tr>
59
+ <tr>
60
+ <td align="center"><img src="assets/demo_attribute_change.gif" width="400"></td>
61
+ <td align="center"><img src="assets/demo_object_removal.gif" width="400"></td>
62
+ </tr>
63
+ <tr>
64
+ <td align="center"><b>Style Transfer</b><br><sub>"Restore the natural, realistic colors to the entire scene, replacing the current black and white style with a full-color rendition."</sub></td>
65
+ <td align="center"><b>Camera Motion</b><br><sub>"Perform a smooth zoom in on the distant snowy mountain peaks to create a more immersive view."</sub></td>
66
+ </tr>
67
+ <tr>
68
+ <td align="center"><img src="assets/demo_style_transfer.gif" width="400"></td>
69
+ <td align="center"><img src="assets/demo_camera_zoom.gif" width="400"></td>
70
+ </tr>
71
+ </table>
72
+
73
+ ---
74
+
75
+ ## πŸ“Š Benchmark at a Glance
76
+
77
+ | | |
78
+ |---|---|
79
+ | πŸ“ **5,049** Annotated Examples | 🎬 **1,419** Source Videos |
80
+ | πŸ“‚ **9 / 32** Categories / Subcategories | πŸ€– **10** Editing Systems |
81
+ | πŸ“ **3** Quality Dimensions (IF, RQ, EE) | πŸ§ͺ **300** Benchmark Test Pairs |
82
+
83
+ ---
84
+
85
+ ## πŸ€— VEFX-Reward Models
86
+
87
+ | Model | Backbone | Params | HuggingFace | Status |
88
+ |---|---|---|---|---|
89
+ | **VEFX-Reward-4B** | Qwen3-VL-4B-Instruct | 4B | [VEFX-Reward/VEFX-Reward-4B](https://huggingface.co/VEFX-Reward/VEFX-Reward-4B) | βœ… Available |
90
+
91
+ ---
92
+
93
+ ## πŸ“¦ VEFX-Bench Dataset
94
+
95
+ The benchmark dataset is hosted on HuggingFace at **[VEFX-Reward/VEFX-Bench](https://huggingface.co/datasets/VEFX-Reward/VEFX-Bench)**.
96
+
97
+ | | |
98
+ |---|---|
99
+ | 🎬 **300** Source Videos (720p) | πŸ“ `prompts.json` with editing instructions |
100
+ | πŸ“‚ **9** Task Categories | πŸ—‚οΈ `benchmark_meta.json` with category labels |
101
+
102
+ **Task Categories:** Style Transfer Β· Object Manipulation Β· Background Change Β· Color/Lighting Β· Motion/Animation Β· Text/Overlay Β· Composition Β· Removal/Inpainting Β· Complex/Multi-step
103
+
104
+ ### Download and Evaluate
105
+
106
+ ```python
107
+ from huggingface_hub import snapshot_download
108
+
109
+ # Download the benchmark dataset
110
+ snapshot_download(repo_id="VEFX-Reward/VEFX-Bench", repo_type="dataset", local_dir="./vefx_bench")
111
+ ```
112
+
113
+ **Evaluation workflow:**
114
+ 1. Download the 300 source videos and `prompts.json`
115
+ 2. Apply your video editing model to each source video following its prompt
116
+ 3. Save edited videos as `0000.mp4` through `0299.mp4` (matching source index)
117
+ 4. Score with VEFX-Reward:
118
+
119
+ ```python
120
+ import json
121
+ from vefx_reward import VEFXReward
122
+
123
+ model = VEFXReward("VEFX-Reward/VEFX-Reward-4B", device="cuda")
124
+
125
+ with open("vefx_bench/prompts.json") as f:
126
+ prompts = json.load(f)
127
+
128
+ for idx, item in enumerate(prompts):
129
+ scores = model.score(
130
+ original_video=f"vefx_bench/{idx:04d}.mp4",
131
+ edited_video=f"your_edits/{idx:04d}.mp4",
132
+ instruction=item["instruction"],
133
+ )
134
+ print(f"[{idx:04d}] IF={scores['IF']:.2f} RQ={scores['RQ']:.2f} EE={scores['EE']:.2f}")
135
+ ```
136
+
137
+ ---
138
+
139
+ ## πŸš€ Quick Start
140
+
141
+ ### Installation
142
+
143
+ ```bash
144
+ conda create -n vefx-bench python=3.10 -y
145
+ conda activate vefx-bench
146
+
147
+ # Install PyTorch first (match your CUDA version)
148
+ # See https://pytorch.org/get-started/locally/ for the right command
149
+ pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124
150
+
151
+ # Install remaining dependencies
152
+ pip install -r requirements.txt
153
+
154
+ # Install the package
155
+ pip install -e .
156
+ ```
157
+
158
+ > **Requirements:** Python β‰₯ 3.10, CUDA GPU, ~10 GB VRAM (bfloat16). Make sure your PyTorch CUDA version matches your driver.
159
+
160
+ ### Score a Video Edit (Python API)
161
+
162
+ ```python
163
+ from vefx_reward import VEFXReward
164
+
165
+ model = VEFXReward("VEFX-Reward/VEFX-Reward-4B", device="cuda")
166
+
167
+ scores = model.score(
168
+ original_video="examples/sample_videos/object_removal_original.mp4",
169
+ edited_video="examples/sample_videos/object_removal_edited.mp4",
170
+ instruction="Remove the woman with the grey backpack walking on the right side of the frame.",
171
+ )
172
+ print(scores)
173
+ # {'IF': 2.34, 'RQ': 1.93, 'EE': 1.82, 'Overall': 6.09}
174
+ ```
175
+
176
+ ### CLI Usage
177
+
178
+ ```bash
179
+ python examples/quick_start.py \
180
+ --original examples/sample_videos/object_removal_original.mp4 \
181
+ --edited examples/sample_videos/object_removal_edited.mp4 \
182
+ --instruction "Remove the woman with the grey backpack walking on the right side of the frame."
183
+ ```
184
+
185
+ ### Score All Included Samples
186
+
187
+ The repo includes 4 sample video pairs with prompts. Score them all:
188
+
189
+ ```python
190
+ import json
191
+ from vefx_reward import VEFXReward
192
+
193
+ model = VEFXReward("VEFX-Reward/VEFX-Reward-4B", device="cuda")
194
+
195
+ with open("examples/sample_videos/prompts.json") as f:
196
+ samples = json.load(f)
197
+
198
+ for sample in samples:
199
+ scores = model.score(
200
+ original_video=f"examples/sample_videos/{sample['original']}",
201
+ edited_video=f"examples/sample_videos/{sample['edited']}",
202
+ instruction=sample["instruction"],
203
+ )
204
+ print(f"[{sample['category']}] IF={scores['IF']:.2f} RQ={scores['RQ']:.2f} EE={scores['EE']:.2f}")
205
+ ```
206
+
207
+ ### Batch Scoring
208
+
209
+ Prepare a CSV with columns `original_video`, `edited_video`, `instruction`:
210
+
211
+ ```bash
212
+ python examples/batch_scoring.py --csv edits.csv --output results.csv
213
+ ```
214
+
215
+ ### Multi-GPU Scoring
216
+
217
+ For large-scale evaluation across multiple GPUs:
218
+
219
+ ```bash
220
+ python examples/multi_gpu_scoring.py --csv edits.csv --num_gpus 4 --output results.csv
221
+ ```
222
+
223
+ ---
224
+
225
+ ## πŸ“– API Reference
226
+
227
+ ### `VEFXReward`
228
+
229
+ ```python
230
+ VEFXReward(
231
+ model_path="VEFX-Reward/VEFX-Reward-4B", # HuggingFace ID or local path
232
+ device="cuda", # "cuda", "cuda:0", "cpu"
233
+ dtype=torch.bfloat16, # torch.bfloat16 or torch.float16
234
+ fps=4.0, # Video sampling rate
235
+ max_frame_pixels=399360, # Max pixels per frame
236
+ )
237
+ ```
238
+
239
+ #### `model.score(original_video, edited_video, instruction) β†’ dict`
240
+
241
+ Score a single video edit. Returns `{'IF': float, 'RQ': float, 'EE': float, 'Overall': float}`.
242
+
243
+ #### `model.score_batch(original_videos, edited_videos, instructions) β†’ list[dict]`
244
+
245
+ Score multiple edits sequentially. Each sample is processed independently to avoid OOM.
246
+
247
+ ---