Spaces:
Running
on
Zero
Running
on
Zero
Update README.md
Browse files
README.md
CHANGED
|
@@ -9,3 +9,100 @@ app_file: app.py
|
|
| 9 |
pinned: false
|
| 10 |
short_description: SText to Audio(Sound SFX) Generator
|
| 11 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
pinned: false
|
| 10 |
short_description: SText to Audio(Sound SFX) Generator
|
| 11 |
---
|
| 12 |
+
## TangoFlux: Text-to-Audio Generation System
|
| 13 |
+
|
| 14 |
+
TangoFlux is a state-of-the-art text-to-audio generation system that converts text descriptions into high-quality audio using advanced AI technology. Built on flow matching and CLAP-ranked preference optimization techniques, it delivers fast and faithful audio synthesis from natural language prompts.
|
| 15 |
+
|
| 16 |
+
### Key Features
|
| 17 |
+
|
| 18 |
+
**1. Advanced Audio Generation**
|
| 19 |
+
- Converts detailed text descriptions into realistic audio
|
| 20 |
+
- Supports complex soundscapes with multiple elements
|
| 21 |
+
- Generates audio up to 30 seconds in duration
|
| 22 |
+
- Produces 44.1kHz high-quality audio output
|
| 23 |
+
|
| 24 |
+
**2. Flexible Generation Controls**
|
| 25 |
+
- **Steps (10-100)**: Controls generation quality vs speed
|
| 26 |
+
- **Guidance Scale (1-10)**: Adjusts how closely the audio follows the prompt
|
| 27 |
+
- **Duration (1-30s)**: Sets the length of generated audio
|
| 28 |
+
|
| 29 |
+
**3. Diverse Audio Capabilities**
|
| 30 |
+
- Natural sounds (ocean waves, thunder, rain)
|
| 31 |
+
- Animal sounds (dogs barking, cats meowing, birds singing)
|
| 32 |
+
- Human sounds (laughter, speaking, whistling, snoring)
|
| 33 |
+
- Mechanical sounds (engines, vehicles, machinery)
|
| 34 |
+
- Complex soundscapes (multiple layered sounds)
|
| 35 |
+
|
| 36 |
+
**4. Technical Architecture**
|
| 37 |
+
- Uses flow matching for efficient generation
|
| 38 |
+
- CLAP-ranked preference optimization for quality
|
| 39 |
+
- GPU-accelerated inference with CUDA support
|
| 40 |
+
- Transformer-based text encoding
|
| 41 |
+
- Optimized for fast generation with @spaces.GPU
|
| 42 |
+
|
| 43 |
+
### How It Works
|
| 44 |
+
|
| 45 |
+
1. **Text Input**: Describe the desired audio in natural language
|
| 46 |
+
2. **Parameter Adjustment**: Fine-tune generation settings
|
| 47 |
+
3. **AI Processing**: The model interprets text and generates corresponding audio
|
| 48 |
+
4. **Audio Output**: Download or play the generated WAV file
|
| 49 |
+
|
| 50 |
+
### Example Use Cases
|
| 51 |
+
- **Film & Video Production**: Create custom sound effects and ambiences
|
| 52 |
+
- **Game Development**: Generate dynamic environmental sounds
|
| 53 |
+
- **Podcast Production**: Add realistic background sounds
|
| 54 |
+
- **Music Production**: Create unique sound textures and effects
|
| 55 |
+
- **Educational Content**: Generate illustrative audio examples
|
| 56 |
+
- **Accessibility**: Convert text descriptions to audio experiences
|
| 57 |
+
|
| 58 |
+
The system includes 20+ pre-configured examples demonstrating various audio generation capabilities, from simple single sounds to complex multi-layered soundscapes.
|
| 59 |
+
|
| 60 |
+
---
|
| 61 |
+
|
| 62 |
+
## TangoFlux: ν
μ€νΈ-ν¬-μ€λμ€ μμ± μμ€ν
|
| 63 |
+
|
| 64 |
+
TangoFluxλ ν
μ€νΈ μ€λͺ
μ κ³ νμ§ μ€λμ€λ‘ λ³ννλ μ΅μ²¨λ¨ ν
μ€νΈ-ν¬-μ€λμ€ μμ± μμ€ν
μ
λλ€. νλ‘μ° λ§€μΉκ³Ό CLAP μμ κΈ°λ° μ νΈλ μ΅μ ν κΈ°μ μ κΈ°λ°μΌλ‘ ꡬμΆλμ΄, μμ°μ΄ ν둬ννΈλ‘λΆν° λΉ λ₯΄κ³ μ νν μ€λμ€ ν©μ±μ μ 곡ν©λλ€.
|
| 65 |
+
|
| 66 |
+
### μ£Όμ κΈ°λ₯
|
| 67 |
+
|
| 68 |
+
**1. κ³ κΈ μ€λμ€ μμ±**
|
| 69 |
+
- μμΈν ν
μ€νΈ μ€λͺ
μ νμ€μ μΈ μ€λμ€λ‘ λ³ν
|
| 70 |
+
- μ¬λ¬ μμκ° ν¬ν¨λ 볡μ‘ν μ¬μ΄λμ€μΌμ΄ν μ§μ
|
| 71 |
+
- μ΅λ 30μ΄ κΈΈμ΄μ μ€λμ€ μμ±
|
| 72 |
+
- 44.1kHz κ³ νμ§ μ€λμ€ μΆλ ₯
|
| 73 |
+
|
| 74 |
+
**2. μ μ°ν μμ± μ μ΄**
|
| 75 |
+
- **Steps (10-100)**: μμ± νμ§ λ μλ μ‘°μ
|
| 76 |
+
- **Guidance Scale (1-10)**: ν둬ννΈ μ€μλ μ‘°μ
|
| 77 |
+
- **Duration (1-30μ΄)**: μμ± μ€λμ€ κΈΈμ΄ μ€μ
|
| 78 |
+
|
| 79 |
+
**3. λ€μν μ€λμ€ μμ± λ₯λ ₯**
|
| 80 |
+
- μμ°μ (νλ, μ²λ₯, λΉ)
|
| 81 |
+
- λλ¬Ό μ리 (κ° μ§λ μ리, κ³ μμ΄ μΈμ, μ μ§μ κ·)
|
| 82 |
+
- μΈκ° μ리 (μμ, λ§νκΈ°, ννλ, μ½κ³¨μ΄)
|
| 83 |
+
- κΈ°κ³μ (μμ§, μ°¨λ, κΈ°κ³λ₯)
|
| 84 |
+
- λ³΅ν© μ¬μ΄λμ€μΌμ΄ν (μ¬λ¬ μΈ΅μ μ리 μ‘°ν©)
|
| 85 |
+
|
| 86 |
+
**4. κΈ°μ μ ꡬ쑰**
|
| 87 |
+
- ν¨μ¨μ μΈ μμ±μ μν νλ‘μ° λ§€μΉ μ¬μ©
|
| 88 |
+
- νμ§ ν₯μμ μν CLAP μμ κΈ°λ° μ νΈλ μ΅μ ν
|
| 89 |
+
- CUDA μ§μ GPU κ°μ μΆλ‘
|
| 90 |
+
- νΈλμ€ν¬λ¨Έ κΈ°λ° ν
μ€νΈ μΈμ½λ©
|
| 91 |
+
- @spaces.GPUλ‘ λΉ λ₯Έ μμ± μ΅μ ν
|
| 92 |
+
|
| 93 |
+
### μλ λ°©μ
|
| 94 |
+
|
| 95 |
+
1. **ν
μ€νΈ μ
λ ₯**: μνλ μ€λμ€λ₯Ό μμ°μ΄λ‘ μ€λͺ
|
| 96 |
+
2. **λ§€κ°λ³μ μ‘°μ **: μμ± μ€μ λ―ΈμΈ μ‘°μ
|
| 97 |
+
3. **AI μ²λ¦¬**: λͺ¨λΈμ΄ ν
μ€νΈλ₯Ό ν΄μνκ³ ν΄λΉ μ€λμ€ μμ±
|
| 98 |
+
4. **μ€λμ€ μΆλ ₯**: μμ±λ WAV νμΌ λ€μ΄λ‘λ λλ μ¬μ
|
| 99 |
+
|
| 100 |
+
### νμ© μμ
|
| 101 |
+
- **μν λ° λΉλμ€ μ μ**: λ§μΆ€ν μ¬μ΄λ ν¨κ³Ό λ° λΆμκΈ°μ μμ±
|
| 102 |
+
- **κ²μ κ°λ°**: λμ νκ²½μ μμ±
|
| 103 |
+
- **νμΊμ€νΈ μ μ**: νμ€μ μΈ λ°°κ²½μ μΆκ°
|
| 104 |
+
- **μμ
μ μ**: λ
νΉν μ¬μ΄λ ν
μ€μ²μ ν¨κ³Ό μμ±
|
| 105 |
+
- **κ΅μ‘ μ½ν
μΈ **: μ€λͺ
μ© μ€λμ€ μμ μμ±
|
| 106 |
+
- **μ κ·Όμ±**: ν
μ€νΈ μ€λͺ
μ μ€λμ€ κ²½νμΌλ‘ λ³ν
|
| 107 |
+
|
| 108 |
+
μ΄ μμ€ν
μ λ¨μν λ¨μΌ μ리λΆν° 볡μ‘ν λ€μΈ΅ μ¬μ΄λμ€μΌμ΄νκΉμ§ λ€μν μ€λμ€ μμ± κΈ°λ₯μ 보μ¬μ£Όλ 20κ° μ΄μμ μ¬μ ꡬμ±λ μμ λ₯Ό ν¬ν¨νκ³ μμ΅λλ€.
|