OnyxMunk commited on
Commit
f86e88f
·
2 Parent(s): e911600 2276588

Resolve merge conflicts: Keep lightweight synthesis version

Browse files
Files changed (5) hide show
  1. .dockerignore +42 -0
  2. Dockerfile +38 -0
  3. GEMINI.md +65 -0
  4. README.md +3 -3
  5. app.py +8 -64
.dockerignore ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.so
6
+ .Python
7
+ env/
8
+ venv/
9
+ ENV/
10
+ .venv
11
+
12
+ # IDE
13
+ .vscode/
14
+ .idea/
15
+ *.swp
16
+ *.swo
17
+ *~
18
+
19
+ # OS
20
+ .DS_Store
21
+ Thumbs.db
22
+
23
+ # Git
24
+ .git/
25
+ .gitignore
26
+
27
+ # Documentation
28
+ *.md
29
+ !README.md
30
+
31
+ # Logs
32
+ *.log
33
+
34
+ # Model cache (will be downloaded in container)
35
+ .cache/
36
+ models/
37
+
38
+ # Test files
39
+ test/
40
+ tests/
41
+ *.test.py
42
+
Dockerfile ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Use Python 3.10 with CUDA support for GPU acceleration
2
+ FROM python:3.10-slim
3
+
4
+ # Set working directory
5
+ WORKDIR /app
6
+
7
+ # Install system dependencies
8
+ RUN apt-get update && apt-get install -y \
9
+ build-essential \
10
+ git \
11
+ && rm -rf /var/lib/apt/lists/*
12
+
13
+ # Copy requirements file first for better Docker layer caching
14
+ COPY requirements.txt .
15
+
16
+ # Install PyTorch with CUDA support for GPU acceleration
17
+ # Hugging Face Spaces provides CUDA runtime, so we use CUDA-enabled PyTorch
18
+ RUN pip install --no-cache-dir torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
19
+
20
+ # Install remaining Python dependencies
21
+ # Note: pip will skip torch since it's already installed (satisfies requirements.txt)
22
+ # The git dependency in requirements.txt requires git (already installed above)
23
+ RUN pip install --no-cache-dir -r requirements.txt
24
+
25
+ # Copy application files
26
+ COPY app.py .
27
+ COPY README.md .
28
+
29
+ # Expose Gradio default port
30
+ EXPOSE 7860
31
+
32
+ # Set environment variables
33
+ ENV GRADIO_SERVER_NAME=0.0.0.0
34
+ ENV GRADIO_SERVER_PORT=7860
35
+
36
+ # Run the application
37
+ CMD ["python", "app.py"]
38
+
GEMINI.md ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Stable Audio Open
2
+
3
+ ## Project Overview
4
+
5
+ **Stable Audio Open** is a Python-based web application that leverages generative AI to create audio from text prompts. It utilizes the Stable Audio technology (via the `diffusers` library) to synthesize high-quality sound effects, music, and ambient noise. The user interface is built with **Gradio**, providing an interactive and accessible way to generate and listen to audio.
6
+
7
+ **Key Technologies:**
8
+ * **Python:** Core programming language.
9
+ * **Gradio:** Web interface framework for machine learning demos.
10
+ * **PyTorch & Diffusers:** Libraries for loading and running the Stable Audio Open model.
11
+ * **Hugging Face Hub:** Source for the pre-trained models.
12
+
13
+ ## Building and Running
14
+
15
+ ### Prerequisites
16
+
17
+ * Python 3.8+
18
+ * CUDA-capable GPU recommended (for faster generation), but runs on CPU (slower).
19
+
20
+ ### Installation
21
+
22
+ 1. **Clone the repository:**
23
+ ```bash
24
+ git clone <repository_url>
25
+ cd Stable-Audio-Open
26
+ ```
27
+
28
+ 2. **Install dependencies:**
29
+ It is recommended to use a virtual environment.
30
+ ```bash
31
+ # Create virtual environment (optional but recommended)
32
+ python -m venv env
33
+ # Windows:
34
+ .\env\Scripts\activate
35
+ # Linux/Mac:
36
+ source env/bin/activate
37
+
38
+ # Install packages
39
+ pip install -r requirements.txt
40
+ ```
41
+
42
+ ### Running the Application
43
+
44
+ To start the Gradio web interface:
45
+
46
+ ```bash
47
+ python app.py
48
+ ```
49
+
50
+ After running the command, the application will typically be accessible at `http://127.0.0.1:7860` in your web browser.
51
+
52
+ ## Development Conventions
53
+
54
+ * **Entry Point:** `app.py` is the main script. It handles model loading, audio generation logic, and UI construction.
55
+ * **Model Caching:** The application implements a simple global caching mechanism (`model_cache`) to avoid reloading the heavy model on every request.
56
+ * **Error Handling:** The `generate_audio` function includes fallback mechanisms. If the model fails to load or generate, it synthesizes a simple sine wave to ensure the UI remains responsive and provides feedback.
57
+ * **Configuration:** Key parameters like model ID (`stabilityai/stable-audio-open-small`) are currently hardcoded in `app.py`.
58
+ * **Dependencies:** Managed via `requirements.txt`.
59
+
60
+ ## Directory Structure
61
+
62
+ * `app.py`: Main application source code.
63
+ * `requirements.txt`: List of Python packages required.
64
+ * `README.md`: General project documentation.
65
+ * `.gitattributes`: Git configuration for file handling.
README.md CHANGED
@@ -3,6 +3,7 @@ title: Stable Audio Open
3
  emoji: 🎵
4
  colorFrom: blue
5
  colorTo: purple
 
6
  sdk: gradio
7
  sdk_version: 6.2.0
8
  app_file: app.py
@@ -39,9 +40,8 @@ An open-source web interface for generating high-quality audio from text prompts
39
 
40
  This application uses:
41
  - **Gradio** for the web interface
42
- - **PyTorch** and **Transformers** for AI model integration
43
- - **Stable Audio** technology for high-quality audio generation
44
-
45
  ## Contributing
46
 
47
  This is an open-source project. Contributions are welcome! Feel free to:
 
3
  emoji: 🎵
4
  colorFrom: blue
5
  colorTo: purple
6
+ <<<<<<< HEAD
7
  sdk: gradio
8
  sdk_version: 6.2.0
9
  app_file: app.py
 
40
 
41
  This application uses:
42
  - **Gradio** for the web interface
43
+ - **NumPy** and **SciPy** for intelligent audio synthesis
44
+ - **Keyword-based generation** that adapts audio characteristics based on prompt content
 
45
  ## Contributing
46
 
47
  This is an open-source project. Contributions are welcome! Feel free to:
app.py CHANGED
@@ -1,7 +1,5 @@
1
  import gradio as gr
2
  import numpy as np
3
- import io
4
- import os
5
 
6
  # Simple audio synthesis - avoiding heavy ML models for now
7
  def generate_audio_from_prompt(prompt, duration, seed):
@@ -76,79 +74,25 @@ def create_audio_generation_interface():
76
 
77
  def generate_audio(prompt, duration, seed):
78
  """
79
- Generate audio based on text prompt using Stable Audio model
80
  """
81
  try:
82
- model = load_stable_audio_model()
83
-
84
- if model == "placeholder":
85
- # Fallback to placeholder if model loading failed
86
- sample_rate = 44100
87
- duration_samples = int(duration * sample_rate)
88
- frequency = 440 + (seed % 200) # Vary frequency based on seed
89
-
90
- t = np.linspace(0, duration, duration_samples, endpoint=False)
91
- audio = 0.3 * np.sin(2 * np.pi * frequency * t)
92
- return (sample_rate, audio), "Using placeholder audio (model loading failed)"
93
-
94
- # Set seed for reproducibility
95
- if seed is not None:
96
- torch.manual_seed(seed)
97
- if torch.cuda.is_available():
98
- torch.cuda.manual_seed(seed)
99
-
100
- # Generate audio with Stable Audio
101
- print(f"Generating audio for prompt: '{prompt}', duration: {duration}s")
102
-
103
- # Create negative prompt for better quality
104
- negative_prompt = "low quality, distorted, noisy, artifacts"
105
-
106
- try:
107
- # Generate the audio with optimized parameters
108
- audio_output = model(
109
- prompt=prompt,
110
- negative_prompt=negative_prompt,
111
- duration=duration,
112
- num_inference_steps=50, # Reduced for faster generation
113
- guidance_scale=3.0, # Reduced for stability
114
- num_waveforms_per_prompt=1,
115
- )
116
-
117
- # Extract the audio data
118
- audio = audio_output.audios[0] # Shape: [channels, samples]
119
-
120
- # Convert to mono if stereo
121
- if audio.ndim > 1:
122
- audio = audio.mean(axis=0)
123
-
124
- # Ensure proper sample rate (Stable Audio uses 44100 Hz)
125
- sample_rate = 44100
126
 
127
- return (sample_rate, audio), "Audio generated successfully with Stable Audio!"
 
128
 
129
- except Exception as gen_error:
130
- print(f"Audio generation failed: {gen_error}")
131
- # Fallback to simple synthesis
132
- sample_rate = 44100
133
- duration_samples = int(duration * sample_rate)
134
- frequency = 440 + (hash(prompt) % 200) # Vary based on prompt
135
-
136
- t = np.linspace(0, duration, duration_samples, endpoint=False)
137
- audio = 0.3 * np.sin(2 * np.pi * frequency * t)
138
-
139
- return (sample_rate, audio), f"Model generation failed, using fallback synthesis"
140
 
141
  except Exception as e:
142
  print(f"Error generating audio: {e}")
143
- # Fallback to simple tone
144
  sample_rate = 44100
145
  duration_samples = int(duration * sample_rate)
146
- frequency = 220 # A3 note
147
-
148
  t = np.linspace(0, duration, duration_samples, endpoint=False)
149
- audio = 0.3 * np.sin(2 * np.pi * frequency * t)
150
 
151
- return (sample_rate, audio), f"Error: {str(e)}. Using fallback audio."
152
 
153
  # Create the Gradio interface
154
  with gr.Blocks(title="Stable Audio Open", theme=gr.themes.Soft()) as interface:
 
1
  import gradio as gr
2
  import numpy as np
 
 
3
 
4
  # Simple audio synthesis - avoiding heavy ML models for now
5
  def generate_audio_from_prompt(prompt, duration, seed):
 
74
 
75
  def generate_audio(prompt, duration, seed):
76
  """
77
+ Generate audio based on text prompt using intelligent synthesis
78
  """
79
  try:
80
+ print(f"Generating audio for prompt: '{prompt}', duration: {duration}s, seed: {seed}")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
81
 
82
+ # Use our intelligent synthesis function
83
+ sample_rate, audio = generate_audio_from_prompt(prompt, duration, seed)
84
 
85
+ return (sample_rate, audio), "Audio generated successfully!"
 
 
 
 
 
 
 
 
 
 
86
 
87
  except Exception as e:
88
  print(f"Error generating audio: {e}")
89
+ # Ultimate fallback
90
  sample_rate = 44100
91
  duration_samples = int(duration * sample_rate)
 
 
92
  t = np.linspace(0, duration, duration_samples, endpoint=False)
93
+ audio = 0.3 * np.sin(2 * np.pi * 440 * t) # Simple A4 tone
94
 
95
+ return (sample_rate, audio), f"Error: {str(e)}. Using simple fallback."
96
 
97
  # Create the Gradio interface
98
  with gr.Blocks(title="Stable Audio Open", theme=gr.themes.Soft()) as interface: