| | --- |
| | title: VEO3 Free |
| | emoji: ๐ |
| | colorFrom: blue |
| | colorTo: indigo |
| | sdk: gradio |
| | sdk_version: 5.35.0 |
| | app_file: app.py |
| | pinned: false |
| | short_description: Wan2.1-T2V-14B + Fast 4-step with NAG + Automatic Audio |
| | models: |
| | - VIDraft/Gemma-3-R1984-4B |
| | - google/gemma-3-4b-it |
| | - Wan-AI/Wan2.1-T2V-14B-Diffusers |
| | - vrgamedevgirl84/Wan14BT2VFusioniX |
| | - Kijai/WanVideo_comfy |
| | --- |
| | ## English Explanation |
| |
|
| | ### Overview |
| | This is a **VEO3 Free** application - an advanced AI video generation system that combines Wan2.1-T2V-14B model with automatic audio generation capabilities. It creates videos from text descriptions and automatically generates matching audio using MMAudio technology. |
| |
|
| | ### Key Features |
| |
|
| | 1. **Text-to-Video Generation** |
| | - Uses Wan2.1-T2V-14B Diffusion model (14 billion parameters) |
| | - Fast 4-step generation with NAG (Noise-Augmented Generation) |
| | - Supports various resolutions from 128x128 to 896x896 |
| | - Duration: 1-8 seconds at 16 FPS |
| | - Cinema-quality output with professional camera movements |
| |
|
| | 2. **Automatic Audio Generation** |
| | - MMAudio integration for synchronized sound effects |
| | - Uses the same text prompt for both video and audio |
| | - Configurable audio quality and guidance strength |
| | - Optional feature - can be disabled if needed |
| |
|
| | 3. **Advanced Controls** |
| | - **NAG Scale**: Controls guidance strength (1.0-20.0) |
| | - **Inference Steps**: Balances quality vs speed (1-8 steps) |
| | - **Seed Control**: For reproducible results |
| | - **Negative Prompts**: Specify what to avoid in generation |
| |
|
| | ### How It Works |
| | 1. **Input**: Enter a detailed scene description |
| | 2. **Video Generation**: The AI creates video frames based on your prompt |
| | 3. **Audio Synthesis**: Automatically generates matching sound effects |
| | 4. **Output**: Combined video with synchronized audio |
| |
|
| | ### Example Use Cases |
| | - Film previews and concept visualization |
| | - Music video creation |
| | - Advertising content |
| | - Creative storytelling |
| | - Game cinematics |
| |
|
| | ### Technical Details |
| | - **GPU Acceleration**: Uses CUDA for fast processing |
| | - **Model Architecture**: Transformer-based diffusion model |
| | - **Audio Model**: Flow-matching based audio synthesis |
| | - **Processing Time**: ~30-70 seconds depending on settings |
| |
|
| | ### Tips for Best Results |
| | - Use detailed, cinematic descriptions |
| | - Include camera movements and visual style |
| | - Specify lighting, colors, and atmosphere |
| | - Add sound descriptions for better audio matching |
| | - Higher NAG scale = more prompt adherence |
| |
|
| | --- |
| |
|
| | ## ํ๊ธ ์ค๋ช
|
| |
|
| | ### ๊ฐ์ |
| | **VEO3 Free**๋ Wan2.1-T2V-14B ๋ชจ๋ธ๊ณผ ์๋ ์ค๋์ค ์์ฑ ๊ธฐ๋ฅ์ ๊ฒฐํฉํ ๊ณ ๊ธ AI ๋น๋์ค ์์ฑ ์์คํ
์
๋๋ค. ํ
์คํธ ์ค๋ช
์ผ๋ก๋ถํฐ ๋น๋์ค๋ฅผ ์์ฑํ๊ณ MMAudio ๊ธฐ์ ์ ์ฌ์ฉํด ์๋์ผ๋ก ์ผ์นํ๋ ์ค๋์ค๋ฅผ ์์ฑํฉ๋๋ค. |
| |
|
| | ### ์ฃผ์ ๊ธฐ๋ฅ |
| |
|
| | 1. **ํ
์คํธ-๋น๋์ค ๋ณํ** |
| | - Wan2.1-T2V-14B Diffusion ๋ชจ๋ธ ์ฌ์ฉ (140์ต ํ๋ผ๋ฏธํฐ) |
| | - NAG(๋
ธ์ด์ฆ ์ฆ๊ฐ ์์ฑ)๋ฅผ ํตํ ๋น ๋ฅธ 4๋จ๊ณ ์์ฑ |
| | - 128x128๋ถํฐ 896x896๊น์ง ๋ค์ํ ํด์๋ ์ง์ |
| | - ์ง์ ์๊ฐ: 16 FPS๋ก 1-8์ด |
| | - ์ ๋ฌธ์ ์ธ ์นด๋ฉ๋ผ ์์ง์์ ํฌํจํ ์ํ ํ์ง ์ถ๋ ฅ |
| |
|
| | 2. **์๋ ์ค๋์ค ์์ฑ** |
| | - ๋๊ธฐํ๋ ์ฌ์ด๋ ํจ๊ณผ๋ฅผ ์ํ MMAudio ํตํฉ |
| | - ๋น๋์ค์ ์ค๋์ค ๋ชจ๋ ๋์ผํ ํ
์คํธ ํ๋กฌํํธ ์ฌ์ฉ |
| | - ์ค๋์ค ํ์ง๊ณผ ๊ฐ์ด๋์ค ๊ฐ๋ ์กฐ์ ๊ฐ๋ฅ |
| | - ์ ํ์ ๊ธฐ๋ฅ - ํ์์ ๋นํ์ฑํ ๊ฐ๋ฅ |
| |
|
| | 3. **๊ณ ๊ธ ์ ์ด ๊ธฐ๋ฅ** |
| | - **NAG ์ค์ผ์ผ**: ๊ฐ์ด๋์ค ๊ฐ๋ ์ ์ด (1.0-20.0) |
| | - **์ถ๋ก ๋จ๊ณ**: ํ์ง ๋ ์๋ ๊ท ํ ์กฐ์ (1-8๋จ๊ณ) |
| | - **์๋ ์ ์ด**: ์ฌํ ๊ฐ๋ฅํ ๊ฒฐ๊ณผ๋ฅผ ์ํ ์ค์ |
| | - **๋ค๊ฑฐํฐ๋ธ ํ๋กฌํํธ**: ์์ฑ์์ ํผํ ์์ ์ง์ |
| |
|
| | ### ์๋ ๋ฐฉ์ |
| | 1. **์
๋ ฅ**: ์์ธํ ์ฅ๋ฉด ์ค๋ช
์
๋ ฅ |
| | 2. **๋น๋์ค ์์ฑ**: AI๊ฐ ํ๋กฌํํธ ๊ธฐ๋ฐ ๋น๋์ค ํ๋ ์ ์์ฑ |
| | 3. **์ค๋์ค ํฉ์ฑ**: ์๋์ผ๋ก ์ผ์นํ๋ ์ฌ์ด๋ ํจ๊ณผ ์์ฑ |
| | 4. **์ถ๋ ฅ**: ๋๊ธฐํ๋ ์ค๋์ค๊ฐ ํฌํจ๋ ๋น๋์ค ์ถ๋ ฅ |
| |
|
| | ### ํ์ฉ ์ฌ๋ก |
| | - ์ํ ํ๋ฆฌ๋ทฐ ๋ฐ ์ปจ์
์๊ฐํ |
| | - ๋ฎค์ง ๋น๋์ค ์ ์ |
| | - ๊ด๊ณ ์ฝํ
์ธ ์์ฑ |
| | - ์ฐฝ์์ ์คํ ๋ฆฌํ
๋ง |
| | - ๊ฒ์ ์๋ค๋งํฑ |
| |
|
| | ### ๊ธฐ์ ์ฌ์ |
| | - **GPU ๊ฐ์**: ๋น ๋ฅธ ์ฒ๋ฆฌ๋ฅผ ์ํ CUDA ์ฌ์ฉ |
| | - **๋ชจ๋ธ ์ํคํ
์ฒ**: ํธ๋์คํฌ๋จธ ๊ธฐ๋ฐ ํ์ฐ ๋ชจ๋ธ |
| | - **์ค๋์ค ๋ชจ๋ธ**: ํ๋ก์ฐ ๋งค์นญ ๊ธฐ๋ฐ ์ค๋์ค ํฉ์ฑ |
| | - **์ฒ๋ฆฌ ์๊ฐ**: ์ค์ ์ ๋ฐ๋ผ ์ฝ 30-70์ด |
| |
|
| | ### ์ต์์ ๊ฒฐ๊ณผ๋ฅผ ์ํ ํ |
| | - ์์ธํ๊ณ ์ํ์ ์ธ ์ค๋ช
์ฌ์ฉ |
| | - ์นด๋ฉ๋ผ ์์ง์๊ณผ ์๊ฐ์ ์คํ์ผ ํฌํจ |
| | - ์กฐ๋ช
, ์์, ๋ถ์๊ธฐ ๋ช
์ |
| | - ๋ ๋์ ์ค๋์ค ๋งค์นญ์ ์ํด ์ฌ์ด๋ ์ค๋ช
์ถ๊ฐ |
| | - ๋์ NAG ์ค์ผ์ผ = ํ๋กฌํํธ์ ๋ ์ถฉ์คํ ์์ฑ |
| |
|
| | ### ํน๋ณ ๊ธฐ๋ฅ |
| | - **์ํ๊ธ ํ๋กฌํํธ ์์ **: ์ ๋ฌธ์ ์ธ ์ดฌ์ ๊ธฐ๋ฒ์ด ํฌํจ๋ 3๊ฐ์ง ์์ ์ ๊ณต |
| | - **์ค์๊ฐ ์งํ ํ์**: ์์ฑ ๊ณผ์ ์ ์ค์๊ฐ์ผ๋ก ํ์ธ |
| | - **์ํด๋ฆญ ์์ ์ ์ฉ**: ์์ ๋ฅผ ํด๋ฆญํ๋ฉด ์๋์ผ๋ก ์ค์ ๊ฐ ์ ์ฉ |
| |
|
| | ์ด ๋๊ตฌ๋ ์ ๋ฌธ๊ฐ ์์ค์ ๋น๋์ค ์ฝํ
์ธ ๋ฅผ ์ฝ๊ฒ ์์ฑํ ์ ์๋๋ก ์ค๊ณ๋์์ผ๋ฉฐ, ์ฐฝ์์ ์ธ ์์ด๋์ด๋ฅผ ๋น ๋ฅด๊ฒ ์๊ฐํํ๋ ๋ฐ ์ด์์ ์
๋๋ค. |