Spaces:

OnyxMunk
/

Stable-Audio-Open

Runtime error

App Files Files Community

OnyxMunk commited on Dec 22, 2025

Commit

f86e88f

2 Parent(s): e911600 2276588

Resolve merge conflicts: Keep lightweight synthesis version

Browse files

Files changed (5) hide show

.dockerignore +42 -0
Dockerfile +38 -0
GEMINI.md +65 -0
README.md +3 -3
app.py +8 -64

.dockerignore ADDED Viewed

	@@ -0,0 +1,42 @@

+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+env/
+venv/
+ENV/
+.venv
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+# OS
+.DS_Store
+Thumbs.db
+# Git
+.git/
+.gitignore
+# Documentation
+*.md
+!README.md
+# Logs
+*.log
+# Model cache (will be downloaded in container)
+.cache/
+models/
+# Test files
+test/
+tests/
+*.test.py

Dockerfile ADDED Viewed

	@@ -0,0 +1,38 @@

+# Use Python 3.10 with CUDA support for GPU acceleration
+FROM python:3.10-slim
+# Set working directory
+WORKDIR /app
+# Install system dependencies
+RUN apt-get update && apt-get install -y \
+    build-essential \
+    git \
+    && rm -rf /var/lib/apt/lists/*
+# Copy requirements file first for better Docker layer caching
+COPY requirements.txt .
+# Install PyTorch with CUDA support for GPU acceleration
+# Hugging Face Spaces provides CUDA runtime, so we use CUDA-enabled PyTorch
+RUN pip install --no-cache-dir torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
+# Install remaining Python dependencies
+# Note: pip will skip torch since it's already installed (satisfies requirements.txt)
+# The git dependency in requirements.txt requires git (already installed above)
+RUN pip install --no-cache-dir -r requirements.txt
+# Copy application files
+COPY app.py .
+COPY README.md .
+# Expose Gradio default port
+EXPOSE 7860
+# Set environment variables
+ENV GRADIO_SERVER_NAME=0.0.0.0
+ENV GRADIO_SERVER_PORT=7860
+# Run the application
+CMD ["python", "app.py"]

GEMINI.md ADDED Viewed

	@@ -0,0 +1,65 @@

+# Stable Audio Open
+## Project Overview
+**Stable Audio Open** is a Python-based web application that leverages generative AI to create audio from text prompts. It utilizes the Stable Audio technology (via the `diffusers` library) to synthesize high-quality sound effects, music, and ambient noise. The user interface is built with **Gradio**, providing an interactive and accessible way to generate and listen to audio.
+**Key Technologies:**
+*   **Python:** Core programming language.
+*   **Gradio:** Web interface framework for machine learning demos.
+*   **PyTorch & Diffusers:** Libraries for loading and running the Stable Audio Open model.
+*   **Hugging Face Hub:** Source for the pre-trained models.
+## Building and Running
+### Prerequisites
+*   Python 3.8+
+*   CUDA-capable GPU recommended (for faster generation), but runs on CPU (slower).
+### Installation
+1.  **Clone the repository:**
+    ```bash
+    git clone <repository_url>
+    cd Stable-Audio-Open
+    ```
+2.  **Install dependencies:**
+    It is recommended to use a virtual environment.
+    ```bash
+    # Create virtual environment (optional but recommended)
+    python -m venv env
+    # Windows:
+    .\env\Scripts\activate
+    # Linux/Mac:
+    source env/bin/activate
+    # Install packages
+    pip install -r requirements.txt
+    ```
+### Running the Application
+To start the Gradio web interface:
+```bash
+python app.py
+```
+After running the command, the application will typically be accessible at `http://127.0.0.1:7860` in your web browser.
+## Development Conventions
+*   **Entry Point:** `app.py` is the main script. It handles model loading, audio generation logic, and UI construction.
+*   **Model Caching:** The application implements a simple global caching mechanism (`model_cache`) to avoid reloading the heavy model on every request.
+*   **Error Handling:** The `generate_audio` function includes fallback mechanisms. If the model fails to load or generate, it synthesizes a simple sine wave to ensure the UI remains responsive and provides feedback.
+*   **Configuration:** Key parameters like model ID (`stabilityai/stable-audio-open-small`) are currently hardcoded in `app.py`.
+*   **Dependencies:** Managed via `requirements.txt`.
+## Directory Structure
+*   `app.py`: Main application source code.
+*   `requirements.txt`: List of Python packages required.
+*   `README.md`: General project documentation.
+*   `.gitattributes`: Git configuration for file handling.

README.md CHANGED Viewed

@@ -3,6 +3,7 @@ title: Stable Audio Open
 emoji: 🎵
 colorFrom: blue
 colorTo: purple
 sdk: gradio
 sdk_version: 6.2.0
 app_file: app.py
@@ -39,9 +40,8 @@ An open-source web interface for generating high-quality audio from text prompts
 This application uses:
 - **Gradio** for the web interface
-- **PyTorch** and **Transformers** for AI model integration
-- **Stable Audio** technology for high-quality audio generation
 ## Contributing
 This is an open-source project. Contributions are welcome! Feel free to:

 emoji: 🎵
 colorFrom: blue
 colorTo: purple
+<<<<<<< HEAD
 sdk: gradio
 sdk_version: 6.2.0
 app_file: app.py
 This application uses:
 - **Gradio** for the web interface
+- **NumPy** and **SciPy** for intelligent audio synthesis
+- **Keyword-based generation** that adapts audio characteristics based on prompt content
 ## Contributing
 This is an open-source project. Contributions are welcome! Feel free to:

app.py CHANGED Viewed

@@ -1,7 +1,5 @@
 import gradio as gr
 import numpy as np
-import io
-import os
 # Simple audio synthesis - avoiding heavy ML models for now
 def generate_audio_from_prompt(prompt, duration, seed):
@@ -76,79 +74,25 @@ def create_audio_generation_interface():
     def generate_audio(prompt, duration, seed):
         """
-        Generate audio based on text prompt using Stable Audio model
         """
         try:
-            model = load_stable_audio_model()
-            if model == "placeholder":
-                # Fallback to placeholder if model loading failed
-                sample_rate = 44100
-                duration_samples = int(duration * sample_rate)
-                frequency = 440 + (seed % 200)  # Vary frequency based on seed
-                t = np.linspace(0, duration, duration_samples, endpoint=False)
-                audio = 0.3 * np.sin(2 * np.pi * frequency * t)
-                return (sample_rate, audio), "Using placeholder audio (model loading failed)"
-            # Set seed for reproducibility
-            if seed is not None:
-                torch.manual_seed(seed)
-                if torch.cuda.is_available():
-                    torch.cuda.manual_seed(seed)
-            # Generate audio with Stable Audio
-            print(f"Generating audio for prompt: '{prompt}', duration: {duration}s")
-            # Create negative prompt for better quality
-            negative_prompt = "low quality, distorted, noisy, artifacts"
-            try:
-                # Generate the audio with optimized parameters
-                audio_output = model(
-                    prompt=prompt,
-                    negative_prompt=negative_prompt,
-                    duration=duration,
-                    num_inference_steps=50,  # Reduced for faster generation
-                    guidance_scale=3.0,      # Reduced for stability
-                    num_waveforms_per_prompt=1,
-                )
-                # Extract the audio data
-                audio = audio_output.audios[0]  # Shape: [channels, samples]
-                # Convert to mono if stereo
-                if audio.ndim > 1:
-                    audio = audio.mean(axis=0)
-                # Ensure proper sample rate (Stable Audio uses 44100 Hz)
-                sample_rate = 44100
-                return (sample_rate, audio), "Audio generated successfully with Stable Audio!"
-            except Exception as gen_error:
-                print(f"Audio generation failed: {gen_error}")
-                # Fallback to simple synthesis
-                sample_rate = 44100
-                duration_samples = int(duration * sample_rate)
-                frequency = 440 + (hash(prompt) % 200)  # Vary based on prompt
-                t = np.linspace(0, duration, duration_samples, endpoint=False)
-                audio = 0.3 * np.sin(2 * np.pi * frequency * t)
-                return (sample_rate, audio), f"Model generation failed, using fallback synthesis"
         except Exception as e:
             print(f"Error generating audio: {e}")
-            # Fallback to simple tone
             sample_rate = 44100
             duration_samples = int(duration * sample_rate)
-            frequency = 220  # A3 note
             t = np.linspace(0, duration, duration_samples, endpoint=False)
-            audio = 0.3 * np.sin(2 * np.pi * frequency * t)
-            return (sample_rate, audio), f"Error: {str(e)}. Using fallback audio."
     # Create the Gradio interface
     with gr.Blocks(title="Stable Audio Open", theme=gr.themes.Soft()) as interface:

 import gradio as gr
 import numpy as np
 # Simple audio synthesis - avoiding heavy ML models for now
 def generate_audio_from_prompt(prompt, duration, seed):
     def generate_audio(prompt, duration, seed):
         """
+        Generate audio based on text prompt using intelligent synthesis
         """
         try:
+            print(f"Generating audio for prompt: '{prompt}', duration: {duration}s, seed: {seed}")
+            # Use our intelligent synthesis function
+            sample_rate, audio = generate_audio_from_prompt(prompt, duration, seed)
+            return (sample_rate, audio), "Audio generated successfully!"
         except Exception as e:
             print(f"Error generating audio: {e}")
+            # Ultimate fallback
             sample_rate = 44100
             duration_samples = int(duration * sample_rate)
             t = np.linspace(0, duration, duration_samples, endpoint=False)
+            audio = 0.3 * np.sin(2 * np.pi * 440 * t)  # Simple A4 tone
+            return (sample_rate, audio), f"Error: {str(e)}. Using simple fallback."
     # Create the Gradio interface
     with gr.Blocks(title="Stable Audio Open", theme=gr.themes.Soft()) as interface: