OnyxlMunkey commited on
Commit
99bda05
·
1 Parent(s): c30d48f

Update requirements: Add missing dependencies for audio processing and improve documentation

Browse files
Files changed (1) hide show
  1. GEMINI.md +65 -0
GEMINI.md ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Stable Audio Open
2
+
3
+ ## Project Overview
4
+
5
+ **Stable Audio Open** is a Python-based web application that leverages generative AI to create audio from text prompts. It utilizes the Stable Audio technology (via the `diffusers` library) to synthesize high-quality sound effects, music, and ambient noise. The user interface is built with **Gradio**, providing an interactive and accessible way to generate and listen to audio.
6
+
7
+ **Key Technologies:**
8
+ * **Python:** Core programming language.
9
+ * **Gradio:** Web interface framework for machine learning demos.
10
+ * **PyTorch & Diffusers:** Libraries for loading and running the Stable Audio Open model.
11
+ * **Hugging Face Hub:** Source for the pre-trained models.
12
+
13
+ ## Building and Running
14
+
15
+ ### Prerequisites
16
+
17
+ * Python 3.8+
18
+ * CUDA-capable GPU recommended (for faster generation), but runs on CPU (slower).
19
+
20
+ ### Installation
21
+
22
+ 1. **Clone the repository:**
23
+ ```bash
24
+ git clone <repository_url>
25
+ cd Stable-Audio-Open
26
+ ```
27
+
28
+ 2. **Install dependencies:**
29
+ It is recommended to use a virtual environment.
30
+ ```bash
31
+ # Create virtual environment (optional but recommended)
32
+ python -m venv env
33
+ # Windows:
34
+ .\env\Scripts\activate
35
+ # Linux/Mac:
36
+ source env/bin/activate
37
+
38
+ # Install packages
39
+ pip install -r requirements.txt
40
+ ```
41
+
42
+ ### Running the Application
43
+
44
+ To start the Gradio web interface:
45
+
46
+ ```bash
47
+ python app.py
48
+ ```
49
+
50
+ After running the command, the application will typically be accessible at `http://127.0.0.1:7860` in your web browser.
51
+
52
+ ## Development Conventions
53
+
54
+ * **Entry Point:** `app.py` is the main script. It handles model loading, audio generation logic, and UI construction.
55
+ * **Model Caching:** The application implements a simple global caching mechanism (`model_cache`) to avoid reloading the heavy model on every request.
56
+ * **Error Handling:** The `generate_audio` function includes fallback mechanisms. If the model fails to load or generate, it synthesizes a simple sine wave to ensure the UI remains responsive and provides feedback.
57
+ * **Configuration:** Key parameters like model ID (`stabilityai/stable-audio-open-small`) are currently hardcoded in `app.py`.
58
+ * **Dependencies:** Managed via `requirements.txt`.
59
+
60
+ ## Directory Structure
61
+
62
+ * `app.py`: Main application source code.
63
+ * `requirements.txt`: List of Python packages required.
64
+ * `README.md`: General project documentation.
65
+ * `.gitattributes`: Git configuration for file handling.