| license: cc-by-nc-4.0 | |
| pipeline_tag: other | |
| tags: | |
| - video-to-audio | |
| - audio-generation | |
| # AC-Foley: Reference-Audio-Guided Video-to-Audio Synthesis with Acoustic Transfer | |
| This repository contains the model described in the paper [AC-Foley: Reference-Audio-Guided Video-to-Audio Synthesis with Acoustic Transfer](https://huggingface.co/papers/2603.15597). | |
| [**Code**](https://github.com/ff2416/AC-Foley) | [**Paper**](https://huggingface.co/papers/2603.15597) | |
| ## Model Description | |
| AC-Foley is an audio-conditioned video-to-audio (V2A) model designed to achieve precise and fine-grained control over generated sounds. Unlike traditional V2A methods that rely heavily on text prompts, AC-Foley directly leverages reference audio to bypass the semantic ambiguities of text descriptions. This approach enables several key features: | |
| - **Fine-grained sound synthesis**: Precise manipulation of acoustic attributes. | |
| - **Timbre transfer**: Applying the characteristics of a reference audio to the video context. | |
| - **Zero-shot generation**: Synthesizing high-quality audio for unseen categories. | |
| - **Improved audio quality**: Achieving state-of-the-art performance for Foley generation. | |
| ## Citation | |
| If you find this work useful for your research, please cite: | |
| ```bibtex | |
| @article{fang2026acfoley, | |
| title={AC-Foley: Reference-Audio-Guided Video-to-Audio Synthesis with Acoustic Transfer}, | |
| author={Fang, Pengjun and He, Yingqing and Xing, Yazhou and Chen, Qifeng and Lim, Ser-Nam and Yang, Harry}, | |
| journal={arXiv preprint arXiv:2603.15597}, | |
| year={2026} | |
| } | |
| ``` |