AdversaLLC netflix:tayingc-netflix-com commited on
Commit
f124b93
·
0 Parent(s):

Duplicate from netflix/void-model

Browse files

Co-authored-by: Ta-Ying Cheng <netflix:tayingc-netflix-com@users.noreply.huggingface.co>

Files changed (5) hide show
  1. .gitattributes +35 -0
  2. README.md +117 -0
  3. gitattributes +35 -0
  4. void_pass1.safetensors +3 -0
  5. void_pass2.safetensors +3 -0
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,117 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - video-inpainting
5
+ - video-editing
6
+ - object-removal
7
+ - cogvideox
8
+ - diffusion
9
+ - video-generation
10
+ pipeline_tag: video-to-video
11
+ ---
12
+
13
+ # VOID: Video Object and Interaction Deletion
14
+
15
+ <video src="https://github.com/user-attachments/assets/ad174ca0-2feb-45f9-9405-83167037d9be" width="100%" controls autoplay loop muted></video>
16
+
17
+ VOID removes objects from videos along with all interactions they induce on the scene — not just secondary effects like shadows and reflections, but **physical interactions** like objects falling when a person is removed.
18
+
19
+ **[Project Page](https://void-model.github.io/)** | **[Paper](https://arxiv.org/pdf/2604.02296)** | **[GitHub](https://github.com/netflix/void-model)** | **[Demo](https://huggingface.co/spaces/sam-motamed/VOID)**
20
+
21
+ ## Quick Start
22
+
23
+ [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/netflix/void-model/blob/main/notebook.ipynb)
24
+
25
+ The included notebook handles setup, downloads models, runs inference on a sample video, and displays the result. Requires a GPU with **40GB+ VRAM** (e.g., A100).
26
+
27
+ ## Model Details
28
+
29
+ VOID is built on [CogVideoX-Fun-V1.5-5b-InP](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.5-5b-InP) and fine-tuned for video inpainting with interaction-aware **quadmask** conditioning — a 4-value mask that encodes the primary object (remove), overlap regions, affected regions (falling objects, displaced items), and background (keep).
30
+
31
+ ### Checkpoints
32
+
33
+ | File | Description | Required? |
34
+ |------|-------------|-----------|
35
+ | `void_pass1.safetensors` | Base inpainting model | Yes |
36
+ | `void_pass2.safetensors` | Warped-noise refinement for temporal consistency | Optional |
37
+
38
+ Pass 1 is sufficient for most videos. Pass 2 adds optical flow-warped latent initialization for improved temporal consistency on longer clips.
39
+
40
+ ### Architecture
41
+
42
+ - **Base:** CogVideoX 3D Transformer (5B parameters)
43
+ - **Input:** Video + quadmask + text prompt describing the scene after removal
44
+ - **Resolution:** 384x672 (default)
45
+ - **Max frames:** 197
46
+ - **Scheduler:** DDIM
47
+ - **Precision:** BF16 with FP8 quantization for memory efficiency
48
+
49
+ ## Usage
50
+
51
+ ### From the Notebook
52
+
53
+ The easiest way — clone the repo and run [`notebook.ipynb`](https://github.com/netflix/void-model/blob/main/notebook.ipynb):
54
+
55
+ ```bash
56
+ git clone https://github.com/netflix/void-model.git
57
+ cd void-model
58
+ ```
59
+
60
+ ### From the CLI
61
+
62
+ ```bash
63
+ # Install dependencies
64
+ pip install -r requirements.txt
65
+
66
+ # Download the base model
67
+ hf download alibaba-pai/CogVideoX-Fun-V1.5-5b-InP \
68
+ --local-dir ./CogVideoX-Fun-V1.5-5b-InP
69
+
70
+ # Download VOID checkpoints
71
+ hf download netflix/void-model \
72
+ --local-dir .
73
+
74
+ # Run Pass 1 inference on a sample
75
+ python inference/cogvideox_fun/predict_v2v.py \
76
+ --config config/quadmask_cogvideox.py \
77
+ --config.data.data_rootdir="./sample" \
78
+ --config.experiment.run_seqs="lime" \
79
+ --config.experiment.save_path="./outputs" \
80
+ --config.video_model.transformer_path="./void_pass1.safetensors"
81
+ ```
82
+
83
+ ### Input Format
84
+
85
+ Each video needs three files in a folder:
86
+
87
+ ```
88
+ my-video/
89
+ input_video.mp4 # source video
90
+ quadmask_0.mp4 # 4-value mask (0=remove, 63=overlap, 127=affected, 255=keep)
91
+ prompt.json # {"bg": "description of scene after removal"}
92
+ ```
93
+
94
+ The repo includes a mask generation pipeline (`VLM-MASK-REASONER/`) that creates quadmasks from raw videos using SAM2 + Gemini.
95
+
96
+ ## Training
97
+
98
+ Trained on paired counterfactual videos generated from two sources:
99
+
100
+ - **HUMOTO** — human-object interactions rendered in Blender with physics simulation
101
+ - **Kubric** — object-only interactions using Google Scanned Objects
102
+
103
+ Training was run on **8x A100 80GB GPUs** using DeepSpeed ZeRO Stage 2. See the [GitHub repo](https://github.com/netflix/void-model#%EF%B8%8F-training) for full training instructions and data generation code.
104
+
105
+ ## Citation
106
+
107
+ ```bibtex
108
+ @misc{motamed2026void,
109
+ title={VOID: Video Object and Interaction Deletion},
110
+ author={Saman Motamed and William Harvey and Benjamin Klein and Luc Van Gool and Zhuoning Yuan and Ta-Ying Cheng},
111
+ year={2026},
112
+ eprint={2604.02296},
113
+ archivePrefix={arXiv},
114
+ primaryClass={cs.CV},
115
+ url={https://arxiv.org/abs/2604.02296}
116
+ }
117
+ ```
gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
void_pass1.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:78cec135c557afe063364602e1038170a49bd5bd3eb909f6b7e30a070173e935
3
+ size 11143042384
void_pass2.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4445d063c3044d0c827ca728387ecd2aecc5344274a4b73d7091e27ec290ec41
3
+ size 11143042384