Spaces:
Runtime error
Runtime error
Commit ·
99bda05
1
Parent(s): c30d48f
Update requirements: Add missing dependencies for audio processing and improve documentation
Browse files
GEMINI.md
ADDED
|
@@ -0,0 +1,65 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Stable Audio Open
|
| 2 |
+
|
| 3 |
+
## Project Overview
|
| 4 |
+
|
| 5 |
+
**Stable Audio Open** is a Python-based web application that leverages generative AI to create audio from text prompts. It utilizes the Stable Audio technology (via the `diffusers` library) to synthesize high-quality sound effects, music, and ambient noise. The user interface is built with **Gradio**, providing an interactive and accessible way to generate and listen to audio.
|
| 6 |
+
|
| 7 |
+
**Key Technologies:**
|
| 8 |
+
* **Python:** Core programming language.
|
| 9 |
+
* **Gradio:** Web interface framework for machine learning demos.
|
| 10 |
+
* **PyTorch & Diffusers:** Libraries for loading and running the Stable Audio Open model.
|
| 11 |
+
* **Hugging Face Hub:** Source for the pre-trained models.
|
| 12 |
+
|
| 13 |
+
## Building and Running
|
| 14 |
+
|
| 15 |
+
### Prerequisites
|
| 16 |
+
|
| 17 |
+
* Python 3.8+
|
| 18 |
+
* CUDA-capable GPU recommended (for faster generation), but runs on CPU (slower).
|
| 19 |
+
|
| 20 |
+
### Installation
|
| 21 |
+
|
| 22 |
+
1. **Clone the repository:**
|
| 23 |
+
```bash
|
| 24 |
+
git clone <repository_url>
|
| 25 |
+
cd Stable-Audio-Open
|
| 26 |
+
```
|
| 27 |
+
|
| 28 |
+
2. **Install dependencies:**
|
| 29 |
+
It is recommended to use a virtual environment.
|
| 30 |
+
```bash
|
| 31 |
+
# Create virtual environment (optional but recommended)
|
| 32 |
+
python -m venv env
|
| 33 |
+
# Windows:
|
| 34 |
+
.\env\Scripts\activate
|
| 35 |
+
# Linux/Mac:
|
| 36 |
+
source env/bin/activate
|
| 37 |
+
|
| 38 |
+
# Install packages
|
| 39 |
+
pip install -r requirements.txt
|
| 40 |
+
```
|
| 41 |
+
|
| 42 |
+
### Running the Application
|
| 43 |
+
|
| 44 |
+
To start the Gradio web interface:
|
| 45 |
+
|
| 46 |
+
```bash
|
| 47 |
+
python app.py
|
| 48 |
+
```
|
| 49 |
+
|
| 50 |
+
After running the command, the application will typically be accessible at `http://127.0.0.1:7860` in your web browser.
|
| 51 |
+
|
| 52 |
+
## Development Conventions
|
| 53 |
+
|
| 54 |
+
* **Entry Point:** `app.py` is the main script. It handles model loading, audio generation logic, and UI construction.
|
| 55 |
+
* **Model Caching:** The application implements a simple global caching mechanism (`model_cache`) to avoid reloading the heavy model on every request.
|
| 56 |
+
* **Error Handling:** The `generate_audio` function includes fallback mechanisms. If the model fails to load or generate, it synthesizes a simple sine wave to ensure the UI remains responsive and provides feedback.
|
| 57 |
+
* **Configuration:** Key parameters like model ID (`stabilityai/stable-audio-open-small`) are currently hardcoded in `app.py`.
|
| 58 |
+
* **Dependencies:** Managed via `requirements.txt`.
|
| 59 |
+
|
| 60 |
+
## Directory Structure
|
| 61 |
+
|
| 62 |
+
* `app.py`: Main application source code.
|
| 63 |
+
* `requirements.txt`: List of Python packages required.
|
| 64 |
+
* `README.md`: General project documentation.
|
| 65 |
+
* `.gitattributes`: Git configuration for file handling.
|