LeeAeron commited on
Commit
0aabbe4
·
verified ·
1 Parent(s): 401c2a5

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +126 -0
README.md ADDED
@@ -0,0 +1,126 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🚀 HunyuanVideo-Foley
2
+
3
+ <div align="center">
4
+
5
+ <img src="assets/logo.png" alt="HunyuanVideo-Foley Logo" width="400">
6
+
7
+ <h4>Portable Version</h4>
8
+
9
+ <h4>Multimodal Diffusion with Representation Alignment for High-Fidelity Foley Audio Generation</h4>
10
+
11
+ <p align="center">
12
+ <strong>Professional-grade AI sound effect generation for video content creators</strong>
13
+ </p>
14
+ </div>
15
+
16
+
17
+ ![HunyuanVideo-Foley](assets/HunyuanVideoFoley.jpg)
18
+
19
+ [![Release](https://img.shields.io/github/release/LeeAeron/HunyuanVideo-Foley.svg)](https://github.com/LeeAeron/HunyuanVideo-Foley/releases/latest)
20
+
21
+
22
+ ## ✨ **Key Highlights**
23
+
24
+ <table align="center" style="border: none; margin: 20px 0;">
25
+ <tr>
26
+ <td align="center" width="33%">
27
+
28
+ 🎭 **Multi-scenario Sync**
29
+ High-quality audio synchronized with complex video scenes
30
+
31
+ </td>
32
+ <td align="center" width="33%">
33
+
34
+ 🧠 **Multi-modal Balance**
35
+ Perfect harmony between visual and textual information
36
+
37
+ </td>
38
+ <td align="center" width="33%">
39
+
40
+ 🎵 **48kHz Hi-Fi Output**
41
+ Professional-grade audio generation with crystal clarity
42
+
43
+ </td>
44
+ </tr>
45
+ </table>
46
+
47
+ </div>
48
+
49
+
50
+ ## 🎯 **Core Highlights**
51
+
52
+ <div style="display: grid; grid-template-columns: 1fr; gap: 15px; margin: 20px 0;">
53
+
54
+ <div style="border-left: 4px solid #4CAF50; padding: 15px; background: #f8f9fa; border-radius: 8px; color: #333;">
55
+
56
+ **🎬 Multi-scenario Audio-Visual Synchronization**
57
+ Supports generating high-quality audio that is synchronized and semantically aligned with complex video scenes, enhancing realism and immersive experience for film/TV and gaming applications.
58
+
59
+ </div>
60
+
61
+ <div style="border-left: 4px solid #2196F3; padding: 15px; background: #f8f9fa; border-radius: 8px; color: #333;">
62
+
63
+ **⚖️ Multi-modal Semantic Balance**
64
+ Intelligently balances visual and textual information analysis, comprehensively orchestrates sound effect elements, avoids one-sided generation, and meets personalized dubbing requirements.
65
+
66
+ </div>
67
+
68
+ <div style="border-left: 4px solid #FF9800; padding: 15px; background: #f8f9fa; border-radius: 8px; color: #333;">
69
+
70
+ **🎵 High-fidelity Audio Output**
71
+ Self-developed 48kHz audio VAE perfectly reconstructs sound effects, music, and vocals, achieving professional-grade audio generation quality.
72
+
73
+ </div>
74
+ </div>
75
+
76
+
77
+ ## ⚙️ Installation
78
+
79
+ **🔧 Portable Version Specs:**
80
+ - **CUDA**: 12.8
81
+ - **Python**: 3.12
82
+ - **OS**: Windows 10/11
83
+ - **VRAM**: 20GB for XXL model (or 8GB with offload mode), 16GB for XL model (or 6GB+ with offload mode)
84
+
85
+
86
+ ### 🖥️ Windows Installation
87
+
88
+ This project provided with only *.bat installer/starter file, that will download and install all components and build fully portable HunyuanVideo-Foley.
89
+
90
+ ➤ Please Note:
91
+ - I'm supporting only nVidia 16xx and RTX20xx-50xx GPUs. Work with GTX10xx is not guarantied, sorry, too old GPU.
92
+ - This installer is intended for those running Windows 10 or higher.
93
+ - Application functionality for systems running Windows 7 or lower is not guaranteed.
94
+
95
+ - Download the F5-TTSx .bat installer for Windows in [Releases](https://github.com/LeeAeron/HunyuanVideo-Foley/releases).
96
+ - Place the BAT-file in any folder in the root of any partition with a short Latin name without spaces or special characters and run it.
97
+ - Select INSTALL (5) entry .bat file will download, unpack and configure all needed environment.
98
+ - The batch file downloads portable Git and Microconda, creates a portable venv, installs last official stable Torch with Cuda 12.8, downloads models, and then deletes part of the downloaded cache.
99
+ After installation, the batch file will automatically launch the browser and begin downloading the google--siglip2-base-patch16-512 and laion--larger_clap_general models to the cache folder.
100
+ Please be patient and wait for the shell to start (monitor in the console).
101
+ - After installation use one of 4 launch modes (1-4) in the *.BAT MENU: XXL/XL model in two modes - without and with Offload. OFFLOAD uses VRAM and RAM, so if you have 32GB+ of RAM, use OFFLOAD.
102
+
103
+
104
+ ## 💻 **Usage**
105
+
106
+ ### 📊 **Model Specifications**
107
+
108
+ ℹ️ Model will be downloaded while first generation started (depending to mode you started HunyuanVideo-Foley).
109
+
110
+ | Model | Checkpoint | VRAM (Normal) | VRAM (Offload) |
111
+ |---------------------|-----------------------------|---------------|----------------|
112
+ | **XXL** *(Default)* | `hunyuanvideo_foley.pth` | 20GB | 12GB |
113
+ | **XL** | `hunyuanvideo_foley_xl.pth` | 16GB | 8GB |
114
+
115
+
116
+ ## 📺 Credits
117
+
118
+ <div align="center" style="margin: 30px 0;">
119
+ <p style="color: #666; margin-top: 15px; font-size: 14px;">
120
+
121
+ © 2025 Tencent Hunyuan. All rights reserved. | Made with ❤️ for the AI community
122
+
123
+ © 2026 LeeAeron, Portable version.
124
+
125
+ </p>
126
+ </div>