Spaces:

fantaxy
/

Sound-AI-SFX

Running on Zero

App Files Files Community

fantaxy commited on Jul 4, 2025

Commit

72a850a

verified ·

1 Parent(s): 7abbc1b

Update README.md

Browse files

Files changed (1) hide show

README.md +97 -0

README.md CHANGED Viewed

@@ -9,3 +9,100 @@ app_file: app.py
 pinned: false
 short_description: SText to Audio(Sound SFX) Generator
 ---

 pinned: false
 short_description: SText to Audio(Sound SFX) Generator
 ---
+## TangoFlux: Text-to-Audio Generation System
+TangoFlux is a state-of-the-art text-to-audio generation system that converts text descriptions into high-quality audio using advanced AI technology. Built on flow matching and CLAP-ranked preference optimization techniques, it delivers fast and faithful audio synthesis from natural language prompts.
+### Key Features
+**1. Advanced Audio Generation**
+- Converts detailed text descriptions into realistic audio
+- Supports complex soundscapes with multiple elements
+- Generates audio up to 30 seconds in duration
+- Produces 44.1kHz high-quality audio output
+**2. Flexible Generation Controls**
+- **Steps (10-100)**: Controls generation quality vs speed
+- **Guidance Scale (1-10)**: Adjusts how closely the audio follows the prompt
+- **Duration (1-30s)**: Sets the length of generated audio
+**3. Diverse Audio Capabilities**
+- Natural sounds (ocean waves, thunder, rain)
+- Animal sounds (dogs barking, cats meowing, birds singing)
+- Human sounds (laughter, speaking, whistling, snoring)
+- Mechanical sounds (engines, vehicles, machinery)
+- Complex soundscapes (multiple layered sounds)
+**4. Technical Architecture**
+- Uses flow matching for efficient generation
+- CLAP-ranked preference optimization for quality
+- GPU-accelerated inference with CUDA support
+- Transformer-based text encoding
+- Optimized for fast generation with @spaces.GPU
+### How It Works
+1. **Text Input**: Describe the desired audio in natural language
+2. **Parameter Adjustment**: Fine-tune generation settings
+3. **AI Processing**: The model interprets text and generates corresponding audio
+4. **Audio Output**: Download or play the generated WAV file
+### Example Use Cases
+- **Film & Video Production**: Create custom sound effects and ambiences
+- **Game Development**: Generate dynamic environmental sounds
+- **Podcast Production**: Add realistic background sounds
+- **Music Production**: Create unique sound textures and effects
+- **Educational Content**: Generate illustrative audio examples
+- **Accessibility**: Convert text descriptions to audio experiences
+The system includes 20+ pre-configured examples demonstrating various audio generation capabilities, from simple single sounds to complex multi-layered soundscapes.
+---
+## TangoFlux: 텍스트-투-오디오 생성 시스템
+TangoFlux는 텍스트 설명을 고품질 오디오로 변환하는 최첨단 텍스트-투-오디오 생성 시스템입니다. 플로우 매칭과 CLAP 순위 기반 선호도 최적화 기술을 기반으로 구축되어, 자연어 프롬프트로부터 빠르고 정확한 오디오 합성을 제공합니다.
+### 주요 기능
+**1. 고급 오디오 생성**
+- 상세한 텍스트 설명을 현실적인 오디오로 변환
+- 여러 요소가 포함된 복잡한 사운드스케이프 지원
+- 최대 30초 길이의 오디오 생성
+- 44.1kHz 고품질 오디오 출력
+**2. 유연한 생성 제어**
+- **Steps (10-100)**: 생성 품질 대 속도 조절
+- **Guidance Scale (1-10)**: 프롬프트 준수도 조정
+- **Duration (1-30초)**: 생성 오디오 길이 설정
+**3. 다양한 오디오 생성 능력**
+- 자연음 (파도, 천둥, 비)
+- 동물 소리 (개 짖는 소리, 고양이 울음, 새 지저귐)
+- 인간 소리 (웃음, 말하기, 휘파람, 코골이)
+- 기계음 (엔진, 차량, 기계류)
+- 복합 사운드스케이프 (여러 층의 소리 조합)
+**4. 기술적 구조**
+- 효율적인 생성을 위한 플로우 매칭 사용
+- 품질 향상을 위한 CLAP 순위 기반 선호도 최적화
+- CUDA 지원 GPU 가속 추론
+- 트랜스포머 기반 텍스트 인코딩
+- @spaces.GPU로 빠른 생성 최적화
+### 작동 방식
+1. **텍스트 입력**: 원하는 오디오를 자연어로 설명
+2. **매개변수 조정**: 생성 설정 미세 조정
+3. **AI 처리**: 모델이 텍스트를 해석하고 해당 오디오 생성
+4. **오디오 출력**: 생성된 WAV 파일 다운로드 또는 재생
+### 활용 예시
+- **영화 및 비디오 제작**: 맞춤형 사운드 효과 및 분위기음 생성
+- **게임 개발**: 동적 환경음 생성
+- **팟캐스트 제작**: 현실적인 배경음 추가
+- **음악 제작**: 독특한 사운드 텍스처와 효과 생성
+- **교육 콘텐츠**: 설명용 오디오 예제 생성
+- **접근성**: 텍스트 설명을 오디오 경험으로 변환
+이 시스템은 단순한 단일 소리부터 복잡한 다층 사운드스케이프까지 다양한 오디오 생성 기능을 보여주는 20개 이상의 사전 구성된 예제를 포함하고 있습니다.