---
title: VEFX-Code
emoji: ๐ฌ
colorFrom: indigo
colorTo: pink
sdk: static
pinned: false
license: apache-2.0
short_description: VEFX-Bench reference code & inference utils
---
# VEFX-Bench
### Benchmarking Generic Video Editing and Visual Effects
**VEFX-Bench** is a comprehensive benchmark for evaluating text-driven video editing and visual effects. It includes **5,049 annotated examples** spanning **9 categories** and **32 subcategories**, evaluated by **VEFX-Reward** โ a VLM-based reward model that scores edits across three dimensions on a 1โ4 scale:
| Dimension | What it measures |
|---|---|
| **Instructional Following (IF)** | Does the edit accurately reflect the editing instruction? |
| **Render Quality (RQ)** | Visual clarity, temporal consistency, and physical plausibility |
| **Edit Exclusivity (EE)** | Were only the intended regions modified, without side-effects? |
---
## ๐ Model Leaderboard
VEFX-Reward scores on 1โ4 scale. Ranked by **GeoAgg** (ฮฑ=2 for IF, ฮฒ=1 for RQ, ฮณ=1 for EE). Higher is better.
> **๐
Updated: May 2, 2026** โ For the latest results & submissions, visit the **[live leaderboard โ](https://vefx-leaderboard.com/)**
| Rank | Model | Type | IF โ | RQ โ | EE โ | GeoAgg โ |
|:---:|---|---|:---:|:---:|:---:|:---:|
| ๐ฅ | **Kling o3 Omni** | Commercial | 3.033 | **3.588** | 3.043 | **3.057** |
| ๐ฅ | **Kling o1** | Commercial | **3.040** | 3.534 | 2.976 | 2.985 |
| ๐ฅ | **Runway Gen-4.5** | Commercial | 2.817 | 3.319 | 2.923 | 2.912 |
| 4 | Seedance 2.0 | Commercial | 2.811 | 3.421 | 3.088 | 2.766 |
| 5 | Grok Imagine | Commercial | 2.606 | 3.346 | **3.376** | 2.723 |
| 6 | Luma Ray 3 | Commercial | 2.702 | 3.403 | 2.705 | 2.717 |
| 7 | UniVideo | Open-source | 2.294 | 3.266 | 3.091 | 2.516 |
| 8 | Wan 2.6 | Commercial | 2.012 | 3.317 | 2.446 | 2.146 |
| 9 | Luma Ray 2 | Commercial | 2.038 | 2.532 | 1.363 | 1.804 |
| 10 | VACE | Open-source | 2.027 | 3.172 | 1.180 | 1.775 |
---
## ๐ฌ Demo Videos
Each demo shows the **original video** (left) alongside the **edited video** (right).
Attribute Change "Change the color of the red industrial trailer to a bright yellow while maintaining the texture and appearance of the metal surface." |
Object Removal "Remove the woman with the grey backpack walking on the right side of the frame." |
 |
 |
Style Transfer "Restore the natural, realistic colors to the entire scene, replacing the current black and white style with a full-color rendition." |
Camera Motion "Perform a smooth zoom in on the distant snowy mountain peaks to create a more immersive view." |
 |
 |
---
## ๐ Benchmark at a Glance
| | |
|---|---|
| ๐ **5,049** Annotated Examples | ๐ฌ **1,419** Source Videos |
| ๐ **9 / 32** Categories / Subcategories | ๐ค **10** Editing Systems |
| ๐ **3** Quality Dimensions (IF, RQ, EE) | ๐งช **300** Benchmark Test Pairs |
---
## ๐ค VEFX-Reward Models
| Model | Backbone | Params | HuggingFace | Status |
|---|---|---|---|---|
| **VEFX-Reward-4B** | Qwen3-VL-4B-Instruct | 4B | [VEFX-Reward/VEFX-Reward-4B](https://huggingface.co/VEFX-Reward/VEFX-Reward-4B) | โ
Available |
---
## ๐ฆ VEFX-Bench Dataset
The benchmark dataset is hosted on HuggingFace at **[VEFX-Reward/VEFX-Bench](https://huggingface.co/datasets/VEFX-Reward/VEFX-Bench)**.
| | |
|---|---|
| ๐ฌ **300** Source Videos (720p) | ๐ `prompts.json` with editing instructions |
| ๐ **9** Task Categories | ๐๏ธ `benchmark_meta.json` with category labels |
**Task Categories:** Style Transfer ยท Object Manipulation ยท Background Change ยท Color/Lighting ยท Motion/Animation ยท Text/Overlay ยท Composition ยท Removal/Inpainting ยท Complex/Multi-step
### Download and Evaluate
```python
from huggingface_hub import snapshot_download
# Download the benchmark dataset
snapshot_download(repo_id="VEFX-Reward/VEFX-Bench", repo_type="dataset", local_dir="./vefx_bench")
```
**Evaluation workflow:**
1. Download the 300 source videos and `prompts.json`
2. Apply your video editing model to each source video following its prompt
3. Save edited videos as `0000.mp4` through `0299.mp4` (matching source index)
4. Score with VEFX-Reward:
```python
import json
from vefx_reward import VEFXReward
model = VEFXReward("VEFX-Reward/VEFX-Reward-4B", device="cuda")
with open("vefx_bench/prompts.json") as f:
prompts = json.load(f)
for idx, item in enumerate(prompts):
scores = model.score(
original_video=f"vefx_bench/{idx:04d}.mp4",
edited_video=f"your_edits/{idx:04d}.mp4",
instruction=item["instruction"],
)
print(f"[{idx:04d}] IF={scores['IF']:.2f} RQ={scores['RQ']:.2f} EE={scores['EE']:.2f}")
```
---
## ๐ Quick Start
### Installation
```bash
conda create -n vefx-bench python=3.10 -y
conda activate vefx-bench
# Install PyTorch first (match your CUDA version)
# See https://pytorch.org/get-started/locally/ for the right command
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124
# Install remaining dependencies
pip install -r requirements.txt
# Install the package
pip install -e .
```
> **Requirements:** Python โฅ 3.10, CUDA GPU, ~10 GB VRAM (bfloat16). Make sure your PyTorch CUDA version matches your driver.
### Score a Video Edit (Python API)
```python
from vefx_reward import VEFXReward
model = VEFXReward("VEFX-Reward/VEFX-Reward-4B", device="cuda")
scores = model.score(
original_video="examples/sample_videos/object_removal_original.mp4",
edited_video="examples/sample_videos/object_removal_edited.mp4",
instruction="Remove the woman with the grey backpack walking on the right side of the frame.",
)
print(scores)
# {'IF': 2.34, 'RQ': 1.93, 'EE': 1.82, 'Overall': 6.09}
```
### CLI Usage
```bash
python examples/quick_start.py \
--original examples/sample_videos/object_removal_original.mp4 \
--edited examples/sample_videos/object_removal_edited.mp4 \
--instruction "Remove the woman with the grey backpack walking on the right side of the frame."
```
### Score All Included Samples
The repo includes 4 sample video pairs with prompts. Score them all:
```python
import json
from vefx_reward import VEFXReward
model = VEFXReward("VEFX-Reward/VEFX-Reward-4B", device="cuda")
with open("examples/sample_videos/prompts.json") as f:
samples = json.load(f)
for sample in samples:
scores = model.score(
original_video=f"examples/sample_videos/{sample['original']}",
edited_video=f"examples/sample_videos/{sample['edited']}",
instruction=sample["instruction"],
)
print(f"[{sample['category']}] IF={scores['IF']:.2f} RQ={scores['RQ']:.2f} EE={scores['EE']:.2f}")
```
### Batch Scoring
Prepare a CSV with columns `original_video`, `edited_video`, `instruction`:
```bash
python examples/batch_scoring.py --csv edits.csv --output results.csv
```
### Multi-GPU Scoring
For large-scale evaluation across multiple GPUs:
```bash
python examples/multi_gpu_scoring.py --csv edits.csv --num_gpus 4 --output results.csv
```
---
## ๐ API Reference
### `VEFXReward`
```python
VEFXReward(
model_path="VEFX-Reward/VEFX-Reward-4B", # HuggingFace ID or local path
device="cuda", # "cuda", "cuda:0", "cpu"
dtype=torch.bfloat16, # torch.bfloat16 or torch.float16
fps=4.0, # Video sampling rate
max_frame_pixels=399360, # Max pixels per frame
)
```
#### `model.score(original_video, edited_video, instruction) โ dict`
Score a single video edit. Returns `{'IF': float, 'RQ': float, 'EE': float, 'Overall': float}`.
#### `model.score_batch(original_videos, edited_videos, instructions) โ list[dict]`
Score multiple edits sequentially. Each sample is processed independently to avoid OOM.
---