--- license: cc-by-nc-4.0 pipeline_tag: other tags: - video-to-audio - audio-generation --- # AC-Foley: Reference-Audio-Guided Video-to-Audio Synthesis with Acoustic Transfer This repository contains the model described in the paper [AC-Foley: Reference-Audio-Guided Video-to-Audio Synthesis with Acoustic Transfer](https://huggingface.co/papers/2603.15597). [**Code**](https://github.com/ff2416/AC-Foley) | [**Paper**](https://huggingface.co/papers/2603.15597) ## Model Description AC-Foley is an audio-conditioned video-to-audio (V2A) model designed to achieve precise and fine-grained control over generated sounds. Unlike traditional V2A methods that rely heavily on text prompts, AC-Foley directly leverages reference audio to bypass the semantic ambiguities of text descriptions. This approach enables several key features: - **Fine-grained sound synthesis**: Precise manipulation of acoustic attributes. - **Timbre transfer**: Applying the characteristics of a reference audio to the video context. - **Zero-shot generation**: Synthesizing high-quality audio for unseen categories. - **Improved audio quality**: Achieving state-of-the-art performance for Foley generation. ## Citation If you find this work useful for your research, please cite: ```bibtex @article{fang2026acfoley, title={AC-Foley: Reference-Audio-Guided Video-to-Audio Synthesis with Acoustic Transfer}, author={Fang, Pengjun and He, Yingqing and Xing, Yazhou and Chen, Qifeng and Lim, Ser-Nam and Yang, Harry}, journal={arXiv preprint arXiv:2603.15597}, year={2026} } ```