Any-to-Any

Improve model card and add metadata

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +65 -3
README.md CHANGED
@@ -1,3 +1,65 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: any-to-any
4
+ ---
5
+
6
+ # AVI-Edit: Audio-sync Video Instance Editing with Granularity-Aware Mask Refiner
7
+
8
+ [**Project Page**](https://hjzheng.net/projects/AVI-Edit/) | [**arXiv**](https://arxiv.org/abs/2512.10571) | [**Code**](https://github.com/suimuc/AVI-Edit-Framework)
9
+
10
+ **AVI-Edit** is a framework for audio-sync video instance editing. It introduces a granularity-aware mask refiner that iteratively refines coarse user-provided masks into precise instance-level regions and a self-feedback audio agent to curate high-quality audio guidance, providing fine-grained temporal control.
11
+
12
+ ## Installation
13
+
14
+ To set up the environment, follow these steps from the official repository:
15
+
16
+ ```bash
17
+ git clone https://github.com/suimuc/AVI-Edit-Framework.git
18
+ cd AVI-Edit-Framework
19
+ conda create -n avi_edit python=3.10
20
+ conda activate avi_edit
21
+ pip install -r requirements.txt
22
+ pip install -e .
23
+ ```
24
+
25
+ ## Usage
26
+
27
+ The framework supports inference using either a pre-edited audio track or an automated audio agent.
28
+
29
+ ### 1. Inference with an Edited Audio Track
30
+
31
+ Use this script when you already have the edited audio:
32
+
33
+ ```bash
34
+ python scripts/inference_with_edited_audio.py \
35
+ --video-path /path/to/input_video.mp4 \
36
+ --audio-path /path/to/edited_audio.wav \
37
+ --mask-path /path/to/mask.mp4 \
38
+ --prompt "Describe the edited scene here." \
39
+ --output-dir /path/to/output_dir
40
+ ```
41
+
42
+ ### 2. Inference with the Audio Agent
43
+
44
+ Use this script to generate replacement audio automatically from the video, mask, and edit prompt:
45
+
46
+ ```bash
47
+ python scripts/inference.py \
48
+ --video-path /path/to/input_video.mp4 \
49
+ --mask-path /path/to/mask.mp4 \
50
+ --prompt "Describe the edited scene here." \
51
+ --output-dir /path/to/output_dir \
52
+ --dashscope-api-key "<YOUR_QWEN_OR_OPENAI_COMPATIBLE_API_KEY>" \
53
+ --eleven-api-key "<YOUR_ELEVENLABS_API_KEY>"
54
+ ```
55
+
56
+ ## Citation
57
+
58
+ ```bibtex
59
+ @article{avi-edit,
60
+ title={Audio-sync Video Instance Editing with Granularity-Aware Mask Refiner},
61
+ author={Zheng, Haojie and Weng, Shuchen and Liu, Jingqi and Yang, Siqi and Shi, Boxin and Wang, Xinlong},
62
+ journal={arXiv preprint arXiv:2512.10571},
63
+ year={2025}
64
+ }
65
+ ```