suimu
/

AVI-Edit

Any-to-Any

Model card Files Files and versions

xet

Community

Improve model card and add metadata

by nielsr HF Staff - opened 17 days ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+65

-3

Files changed (1) hide show

README.md +65 -3

README.md CHANGED Viewed

@@ -1,3 +1,65 @@
----
-license: mit
----

+---
+license: mit
+pipeline_tag: any-to-any
+---
+# AVI-Edit: Audio-sync Video Instance Editing with Granularity-Aware Mask Refiner
+[**Project Page**](https://hjzheng.net/projects/AVI-Edit/) | [**arXiv**](https://arxiv.org/abs/2512.10571) | [**Code**](https://github.com/suimuc/AVI-Edit-Framework)
+**AVI-Edit** is a framework for audio-sync video instance editing. It introduces a granularity-aware mask refiner that iteratively refines coarse user-provided masks into precise instance-level regions and a self-feedback audio agent to curate high-quality audio guidance, providing fine-grained temporal control.
+## Installation
+To set up the environment, follow these steps from the official repository:
+```bash
+git clone https://github.com/suimuc/AVI-Edit-Framework.git
+cd AVI-Edit-Framework
+conda create -n avi_edit python=3.10
+conda activate avi_edit
+pip install -r requirements.txt
+pip install -e .
+```
+## Usage
+The framework supports inference using either a pre-edited audio track or an automated audio agent.
+### 1. Inference with an Edited Audio Track
+Use this script when you already have the edited audio:
+```bash
+python scripts/inference_with_edited_audio.py \
+  --video-path /path/to/input_video.mp4 \
+  --audio-path /path/to/edited_audio.wav \
+  --mask-path /path/to/mask.mp4 \
+  --prompt "Describe the edited scene here." \
+  --output-dir /path/to/output_dir
+```
+### 2. Inference with the Audio Agent
+Use this script to generate replacement audio automatically from the video, mask, and edit prompt:
+```bash
+python scripts/inference.py \
+  --video-path /path/to/input_video.mp4 \
+  --mask-path /path/to/mask.mp4 \
+  --prompt "Describe the edited scene here." \
+  --output-dir /path/to/output_dir \
+  --dashscope-api-key "<YOUR_QWEN_OR_OPENAI_COMPATIBLE_API_KEY>" \
+  --eleven-api-key "<YOUR_ELEVENLABS_API_KEY>"
+```
+## Citation
+```bibtex
+@article{avi-edit,
+  title={Audio-sync Video Instance Editing with Granularity-Aware Mask Refiner},
+  author={Zheng, Haojie and Weng, Shuchen and Liu, Jingqi and Yang, Siqi and Shi, Boxin and Wang, Xinlong},
+  journal={arXiv preprint arXiv:2512.10571},
+  year={2025}
+}
+```