Spaces:
Sleeping
Sleeping
| title: Audio Agent | |
| colorFrom: yellow | |
| colorTo: blue | |
| sdk: gradio | |
| app_file: src/ui.py | |
| pinned: true | |
| license: apache-2.0 | |
| emoji: π | |
| short_description: An intelligent audio processing assistant powered by AI | |
| sdk_version: 5.33.0 | |
| tags: | |
| - agent-demo-track | |
| # Audio Agent - Your AI Audio Assistant | |
| An intelligent audio processing assistant powered by AI that can help you manipulate, analyze, and transcribe audio files through a simple web interface. | |
| You can see the demo here [Demo](https://youtu.be/BYYnWm-yJMo) | |
| ## Features | |
| ποΈ **Audio Manipulation** | |
| - Merge multiple audio files into one continuous track | |
| - Cut or trim specific sections from any file | |
| - Adjust volume levels (increase or decrease) | |
| - Normalize audio levels for consistency | |
| - Apply fade-in or fade-out effects for smooth transitions (Mono channel only) | |
| - Change playback speed (faster or slower, with pitch change) | |
| - Reverse audio for creative effects | |
| - Remove silence from beginning or end of files | |
| π **Analysis & Transcription** (English only) | |
| - Transcribe speech in audio to text | |
| - Analyze audio properties (duration, sample rate, etc.) | |
| **Supported Audio Formats**: MP3, WAV, M4A, FLAC, AAC, OGG | |
| ## Requirements | |
| - Python 3.13 | |
| - OpenAI API key | |
| - MCP (Model Context Protocol) Server for audio tools | |
| ## Installation | |
| 1. **Clone the repository** | |
| ```bash | |
| git clone <repository-url> | |
| cd audio-agent | |
| ``` | |
| 2. **Install dependencies** | |
| The project uses Poetry for dependency management. All dependencies are defined in `pyproject.toml`. | |
| Using Poetry (recommended): | |
| ```bash | |
| poetry install | |
| ``` | |
| Or using pip: | |
| ```bash | |
| pip install -e . | |
| ``` | |
| ## Configuration | |
| ### Environment Variables | |
| Create a `.env` file in the project root or set the following environment variables: | |
| ```bash | |
| # Required: MCP Server endpoint for audio tools | |
| MCP_SERVER=your_mcp_server_endpoint | |
| # Optional: OpenAI API key (can also be provided in the UI) | |
| OPENAI_API_KEY=sk-your-openai-api-key-here | |
| ``` | |
| ### Environment Variable Details | |
| - **`MCP_SERVER`** (Required): The endpoint URL for the MCP server that provides audio processing tools | |
| - **`OPENAI_API_KEY`** (Optional): Your OpenAI API key. If not set here, you can provide it through the web interface | |
| ## Usage | |
| ### Running the Application | |
| Start the web interface with: | |
| ```bash | |
| python -m src.ui | |
| ``` | |
| The application will launch a Gradio web interface accessible at: | |
| - Local: `http://localhost:7861` | |
| - Public share URL (if enabled) | |
| ### Using the Interface | |
| 1. **Configure the Model**: Select your preferred AI model and adjust settings in the right panel | |
| 2. **Provide API Key**: Enter your OpenAI API key if not set in environment variables | |
| 3. **Upload Audio Files**: Drag and drop or select audio files to process | |
| 4. **Describe Your Task**: Type what you want to do with the audio files | |
| 5. **Get Results**: The AI will process your request and provide the results | |
| ### Example Requests | |
| - *"Merge these two audio files and add a fade-in effect"* | |
| - *"Remove the silence at the beginning of this recording"* | |
| - *"Transcribe the speech in this audio file"* | |
| - *"Increase the volume of the first track and normalize both files"* | |
| - *"Cut out the middle section from 1:30 to 2:45"* | |
| - *"Make this audio play 1.5x faster"* | |
| - *"Apply a fade-out effect to the end of this track"* | |
| ## Dependencies | |
| The project relies on several key libraries: | |
| - **LangGraph** (0.4.8+): For building the AI agent workflow | |
| - **Gradio** (5.33.0+): For the web interface | |
| - **LangChain OpenAI** (0.3.21+): For OpenAI model integration | |
| - **LangChain MCP Adapters** (0.1.7+): For Model Context Protocol integration | |
| - **dotenv** (0.9.9+): For environment variable management | |
| See `pyproject.toml` for the complete list of dependencies. | |
| ## Troubleshooting | |
| ### Common Issues | |
| 1. **"Please configure the agent first"** | |
| - Ensure you've provided a valid OpenAI API key | |
| - Check that the selected model is available | |
| 2. **Audio processing errors** | |
| - Verify the MCP_SERVER environment variable is set correctly | |
| - Ensure your audio files are in supported formats | |
| - Check that the MCP server is running and accessible | |
| 3. **Import errors** | |
| - Make sure all dependencies are installed: `poetry install` or `pip install -e .` | |
| - Verify you're using Python 3.13 or higher | |
| ### Getting Help | |
| If you encounter issues: | |
| 1. Check the console output for error messages | |
| 2. Verify your environment variables are set correctly | |
| 3. Ensure your audio files are in supported formats | |
| 4. Try with different AI models if one isn't working |