HunyuanVideo-Foley / README.md
LeeAeron's picture
Upload README.md
0aabbe4 verified

πŸš€ HunyuanVideo-Foley

HunyuanVideo-Foley Logo

Portable Version

Multimodal Diffusion with Representation Alignment for High-Fidelity Foley Audio Generation

Professional-grade AI sound effect generation for video content creators

HunyuanVideo-Foley

Release

✨ Key Highlights

🎭 Multi-scenario Sync
High-quality audio synchronized with complex video scenes

🧠 Multi-modal Balance
Perfect harmony between visual and textual information

🎡 48kHz Hi-Fi Output
Professional-grade audio generation with crystal clarity

🎯 Core Highlights

🎬 Multi-scenario Audio-Visual Synchronization
Supports generating high-quality audio that is synchronized and semantically aligned with complex video scenes, enhancing realism and immersive experience for film/TV and gaming applications.

βš–οΈ Multi-modal Semantic Balance
Intelligently balances visual and textual information analysis, comprehensively orchestrates sound effect elements, avoids one-sided generation, and meets personalized dubbing requirements.

🎡 High-fidelity Audio Output
Self-developed 48kHz audio VAE perfectly reconstructs sound effects, music, and vocals, achieving professional-grade audio generation quality.

βš™οΈ Installation

πŸ”§ Portable Version Specs:

  • CUDA: 12.8
  • Python: 3.12
  • OS: Windows 10/11
  • VRAM: 20GB for XXL model (or 8GB with offload mode), 16GB for XL model (or 6GB+ with offload mode)

πŸ–₯️ Windows Installation

This project provided with only *.bat installer/starter file, that will download and install all components and build fully portable HunyuanVideo-Foley.

➀ Please Note: - I'm supporting only nVidia 16xx and RTX20xx-50xx GPUs. Work with GTX10xx is not guarantied, sorry, too old GPU. - This installer is intended for those running Windows 10 or higher. - Application functionality for systems running Windows 7 or lower is not guaranteed.

  • Download the F5-TTSx .bat installer for Windows in Releases.
  • Place the BAT-file in any folder in the root of any partition with a short Latin name without spaces or special characters and run it.
  • Select INSTALL (5) entry .bat file will download, unpack and configure all needed environment.
  • The batch file downloads portable Git and Microconda, creates a portable venv, installs last official stable Torch with Cuda 12.8, downloads models, and then deletes part of the downloaded cache. After installation, the batch file will automatically launch the browser and begin downloading the google--siglip2-base-patch16-512 and laion--larger_clap_general models to the cache folder. Please be patient and wait for the shell to start (monitor in the console).
  • After installation use one of 4 launch modes (1-4) in the *.BAT MENU: XXL/XL model in two modes - without and with Offload. OFFLOAD uses VRAM and RAM, so if you have 32GB+ of RAM, use OFFLOAD.

πŸ’» Usage

πŸ“Š Model Specifications

ℹ️ Model will be downloaded while first generation started (depending to mode you started HunyuanVideo-Foley).

Model Checkpoint VRAM (Normal) VRAM (Offload)
XXL (Default) hunyuanvideo_foley.pth 20GB 12GB
XL hunyuanvideo_foley_xl.pth 16GB 8GB

πŸ“Ί Credits

© 2025 Tencent Hunyuan. All rights reserved. | Made with ❀️ for the AI community

Β© 2026 LeeAeron, Portable version.