Upload README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,126 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# 🚀 HunyuanVideo-Foley
|
| 2 |
+
|
| 3 |
+
<div align="center">
|
| 4 |
+
|
| 5 |
+
<img src="assets/logo.png" alt="HunyuanVideo-Foley Logo" width="400">
|
| 6 |
+
|
| 7 |
+
<h4>Portable Version</h4>
|
| 8 |
+
|
| 9 |
+
<h4>Multimodal Diffusion with Representation Alignment for High-Fidelity Foley Audio Generation</h4>
|
| 10 |
+
|
| 11 |
+
<p align="center">
|
| 12 |
+
<strong>Professional-grade AI sound effect generation for video content creators</strong>
|
| 13 |
+
</p>
|
| 14 |
+
</div>
|
| 15 |
+
|
| 16 |
+
|
| 17 |
+

|
| 18 |
+
|
| 19 |
+
[](https://github.com/LeeAeron/HunyuanVideo-Foley/releases/latest)
|
| 20 |
+
|
| 21 |
+
|
| 22 |
+
## ✨ **Key Highlights**
|
| 23 |
+
|
| 24 |
+
<table align="center" style="border: none; margin: 20px 0;">
|
| 25 |
+
<tr>
|
| 26 |
+
<td align="center" width="33%">
|
| 27 |
+
|
| 28 |
+
🎭 **Multi-scenario Sync**
|
| 29 |
+
High-quality audio synchronized with complex video scenes
|
| 30 |
+
|
| 31 |
+
</td>
|
| 32 |
+
<td align="center" width="33%">
|
| 33 |
+
|
| 34 |
+
🧠 **Multi-modal Balance**
|
| 35 |
+
Perfect harmony between visual and textual information
|
| 36 |
+
|
| 37 |
+
</td>
|
| 38 |
+
<td align="center" width="33%">
|
| 39 |
+
|
| 40 |
+
🎵 **48kHz Hi-Fi Output**
|
| 41 |
+
Professional-grade audio generation with crystal clarity
|
| 42 |
+
|
| 43 |
+
</td>
|
| 44 |
+
</tr>
|
| 45 |
+
</table>
|
| 46 |
+
|
| 47 |
+
</div>
|
| 48 |
+
|
| 49 |
+
|
| 50 |
+
## 🎯 **Core Highlights**
|
| 51 |
+
|
| 52 |
+
<div style="display: grid; grid-template-columns: 1fr; gap: 15px; margin: 20px 0;">
|
| 53 |
+
|
| 54 |
+
<div style="border-left: 4px solid #4CAF50; padding: 15px; background: #f8f9fa; border-radius: 8px; color: #333;">
|
| 55 |
+
|
| 56 |
+
**🎬 Multi-scenario Audio-Visual Synchronization**
|
| 57 |
+
Supports generating high-quality audio that is synchronized and semantically aligned with complex video scenes, enhancing realism and immersive experience for film/TV and gaming applications.
|
| 58 |
+
|
| 59 |
+
</div>
|
| 60 |
+
|
| 61 |
+
<div style="border-left: 4px solid #2196F3; padding: 15px; background: #f8f9fa; border-radius: 8px; color: #333;">
|
| 62 |
+
|
| 63 |
+
**⚖️ Multi-modal Semantic Balance**
|
| 64 |
+
Intelligently balances visual and textual information analysis, comprehensively orchestrates sound effect elements, avoids one-sided generation, and meets personalized dubbing requirements.
|
| 65 |
+
|
| 66 |
+
</div>
|
| 67 |
+
|
| 68 |
+
<div style="border-left: 4px solid #FF9800; padding: 15px; background: #f8f9fa; border-radius: 8px; color: #333;">
|
| 69 |
+
|
| 70 |
+
**🎵 High-fidelity Audio Output**
|
| 71 |
+
Self-developed 48kHz audio VAE perfectly reconstructs sound effects, music, and vocals, achieving professional-grade audio generation quality.
|
| 72 |
+
|
| 73 |
+
</div>
|
| 74 |
+
</div>
|
| 75 |
+
|
| 76 |
+
|
| 77 |
+
## ⚙️ Installation
|
| 78 |
+
|
| 79 |
+
**🔧 Portable Version Specs:**
|
| 80 |
+
- **CUDA**: 12.8
|
| 81 |
+
- **Python**: 3.12
|
| 82 |
+
- **OS**: Windows 10/11
|
| 83 |
+
- **VRAM**: 20GB for XXL model (or 8GB with offload mode), 16GB for XL model (or 6GB+ with offload mode)
|
| 84 |
+
|
| 85 |
+
|
| 86 |
+
### 🖥️ Windows Installation
|
| 87 |
+
|
| 88 |
+
This project provided with only *.bat installer/starter file, that will download and install all components and build fully portable HunyuanVideo-Foley.
|
| 89 |
+
|
| 90 |
+
➤ Please Note:
|
| 91 |
+
- I'm supporting only nVidia 16xx and RTX20xx-50xx GPUs. Work with GTX10xx is not guarantied, sorry, too old GPU.
|
| 92 |
+
- This installer is intended for those running Windows 10 or higher.
|
| 93 |
+
- Application functionality for systems running Windows 7 or lower is not guaranteed.
|
| 94 |
+
|
| 95 |
+
- Download the F5-TTSx .bat installer for Windows in [Releases](https://github.com/LeeAeron/HunyuanVideo-Foley/releases).
|
| 96 |
+
- Place the BAT-file in any folder in the root of any partition with a short Latin name without spaces or special characters and run it.
|
| 97 |
+
- Select INSTALL (5) entry .bat file will download, unpack and configure all needed environment.
|
| 98 |
+
- The batch file downloads portable Git and Microconda, creates a portable venv, installs last official stable Torch with Cuda 12.8, downloads models, and then deletes part of the downloaded cache.
|
| 99 |
+
After installation, the batch file will automatically launch the browser and begin downloading the google--siglip2-base-patch16-512 and laion--larger_clap_general models to the cache folder.
|
| 100 |
+
Please be patient and wait for the shell to start (monitor in the console).
|
| 101 |
+
- After installation use one of 4 launch modes (1-4) in the *.BAT MENU: XXL/XL model in two modes - without and with Offload. OFFLOAD uses VRAM and RAM, so if you have 32GB+ of RAM, use OFFLOAD.
|
| 102 |
+
|
| 103 |
+
|
| 104 |
+
## 💻 **Usage**
|
| 105 |
+
|
| 106 |
+
### 📊 **Model Specifications**
|
| 107 |
+
|
| 108 |
+
ℹ️ Model will be downloaded while first generation started (depending to mode you started HunyuanVideo-Foley).
|
| 109 |
+
|
| 110 |
+
| Model | Checkpoint | VRAM (Normal) | VRAM (Offload) |
|
| 111 |
+
|---------------------|-----------------------------|---------------|----------------|
|
| 112 |
+
| **XXL** *(Default)* | `hunyuanvideo_foley.pth` | 20GB | 12GB |
|
| 113 |
+
| **XL** | `hunyuanvideo_foley_xl.pth` | 16GB | 8GB |
|
| 114 |
+
|
| 115 |
+
|
| 116 |
+
## 📺 Credits
|
| 117 |
+
|
| 118 |
+
<div align="center" style="margin: 30px 0;">
|
| 119 |
+
<p style="color: #666; margin-top: 15px; font-size: 14px;">
|
| 120 |
+
|
| 121 |
+
© 2025 Tencent Hunyuan. All rights reserved. | Made with ❤️ for the AI community
|
| 122 |
+
|
| 123 |
+
© 2026 LeeAeron, Portable version.
|
| 124 |
+
|
| 125 |
+
</p>
|
| 126 |
+
</div>
|