Spaces:

frascuchon
/

music-mcp

Running on CPU Upgrade

App Files Files Community

frascuchon HF Staff commited on Dec 2, 2025

Commit

cf28ba5

1 Parent(s): 20de2d7

review UI and docs

Browse files

Files changed (2) hide show

README.md +185 -3
mcp_server.py +78 -11

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
-title: Music Mcp
-emoji: 👀
 colorFrom: indigo
 colorTo: pink
 sdk: gradio
@@ -9,4 +9,186 @@ app_file: mcp_server.py
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Music AI Tools
+emoji: 🎵🎶
 colorFrom: indigo
 colorTo: pink
 sdk: gradio
 pinned: false
 ---
+# 🎵 Music AI Tools - Fun Audio Processing Playground
+A comprehensive demo project showcasing 25+ audio processing tools powered by cutting-edge AI models and traditional audio processing libraries. This playground provides both web-based and MCP (Model Context Protocol) interfaces for exploring audio manipulation, analysis, and creative possibilities.
+## 🎯 What's Inside
+### 🤖 AI-Powered Features
+- **🎵 Stem Separation** using [Demucs](https://github.com/facebookresearch/demucs) by Facebook Research
+- **🎤 Voice Replacement** using [Seed-VC](https://huggingface.co/spaces/Plachta/Seed-VC) on Hugging Face
+- **🧠 Music Understanding** using [Music-Flamingo](https://github.com/NVIDIA/music-flamingo) by NVIDIA
+### 🎛️ Audio Processing Capabilities
+- **⚙️ Audio Analysis** with [Librosa](https://librosa.org/) for feature extraction
+- **🎬 Audio Conversion** with [FFmpeg](https://ffmpeg.org/) for format processing
+- **🚀 High Performance** with GPU acceleration and parallel processing
+## 🎪 Demo Features
+### Stem Processing Tools
+- **Stem Separation** - Full 4-stem separation (vocals, drums, bass, other)
+- **Selective Stems** - Extract only specific stems to save processing time
+- **Vocal/Instrumental** - Separate vocals from instrumental components
+- **Karaoke Creation** - One-click instrumental track generation
+### Audio Manipulation Tools
+- **Pitch Alignment** - Shift audio pitch by semitones
+- **Key Estimation** - Estimate musical key using harmonic analysis
+- **Shift to Key** - Shift audio to specific musical key
+- **Align Songs by Key** - Harmonically align multiple tracks
+- **Time Stretching** - Change tempo without affecting pitch
+- **BPM Alignment** - Align two tracks to same BPM
+- **Medley Creation** - Fun vocal/instrumental mixing
+### Audio Editing Tools
+- **Audio Cutting** - Extract segments between time points
+- **Mute Windows** - Mute specific time ranges with smooth fades
+- **Extract Segments** - Extract multiple segments with joining options
+- **Trim Audio** - Trim from beginning/end with precision
+- **Insert Section** - Insert audio sections at precise positions
+- **Replace Section** - Replace audio segments with crossfades
+### Analysis & Information Tools
+- **Audio Information** - Get detailed file information
+- **Music Understanding** - AI-powered music analysis
+- **Song Structure** - Identify song sections (verse, chorus, bridge)
+- **Cutting Points** - AI-suggested optimal edit points
+- **Genre Analysis** - Detailed genre and style analysis
+### Special Features
+- **Voice Replacement** - Replace voice using Seed-VC AI model
+- **Audio Cleaning** - Remove noise (hiss, hum, background)
+- **YouTube Extraction** - Extract audio from YouTube videos
+## 🚀 Quick Start
+### Prerequisites
+```bash
+# Install dependencies
+pip install -r requirements.txt
+# For GPU acceleration (optional but recommended)
+pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
+```
+### Running the Demo
+#### Web Interface (Recommended for Demo)
+```bash
+python mcp_server.py
+```
+Then open http://localhost:7860 in your browser to access the fun playground interface with 25+ tools!
+#### MCP Server Mode
+```bash
+python mcp_server.py --mcp
+```
+Run as MCP server for integration with AI assistants and other tools.
+## 🎮 Using the Tools
+### Web Interface
+1. **Upload Audio** - Drag & drop or browse for audio files (WAV, MP3, FLAC, M4A)
+2. **Select Tool** - Choose from 25+ different audio processing tools
+3. **Configure Settings** - Adjust parameters for each tool
+4. **Process & Download** - Get results instantly with real-time progress
+### Supported Formats
+- **Input:** WAV, MP3, FLAC, M4A
+- **Output:** WAV, MP3 (configurable)
+- **URL Support:** Direct processing from YouTube and other URLs
+## 🛠️ Project Structure
+```
+music-mcp/
+├── mcp_server.py              # Main server with Gradio interface
+├── requirements.txt           # Python dependencies
+├── tools/                     # Audio processing modules
+│   ├── stems_separation.py    # Demucs-based stem separation
+│   ├── voice_replacement.py   # Seed-VC voice conversion
+│   ├── music_understanding.py  # Music-Flamingo AI analysis
+│   ├── pitch_alignment.py     # Key detection and pitch shifting
+│   ├── time_strech.py        # BPM alignment and time stretching
+│   ├── audio_cutting.py      # Audio editing and manipulation
+│   ├── audio_cleaning.py     # Noise removal and cleaning
+│   ├── combine_tracks.py      # Track mixing and medley creation
+│   ├── audio_info.py         # File information and validation
+│   └── youtube_extract.py   # YouTube audio extraction
+├── examples/                 # Sample audio files for testing
+├── output/                  # Generated audio files
+└── youtube_downloads/        # Cached YouTube downloads
+```
+## 🎯 AI Model Details
+### 🤖 AI Models Used
+- **Demucs** (Facebook Research) - State-of-the-art source separation
+- **Seed-VC** (Hugging Face) - High-quality voice conversion
+- **Music-Flamingo** (NVIDIA) - Advanced music understanding and analysis
+### 🎛️ Audio Processing Libraries
+- **Librosa** - Audio feature extraction and analysis
+- **FFmpeg** - Audio format conversion and processing
+- **PyTorch** - Deep learning framework for AI models
+## 🎨 Customization
+### Adding New Tools
+1. Create new function in appropriate `tools/` module
+2. Add wrapper function with MCP compatibility
+3. Register in `mcp_server.py` interface creation
+4. Update documentation
+## 🔧 Development
+### Code Quality
+```bash
+# Linting
+ruff check .
+# Formatting
+ruff format .
+# Type checking
+mypy . --follow-untyped-imports
+```
+### Dependencies
+- **Core:** gradio, torch, librosa, soundfile
+- **AI Models:** demucs, transformers
+- **Audio Processing:** ffmpeg-python, numpy, scipy
+- **Web:** yt-dlp, requests, gradio-client
+## 🎪 Demo Use Cases
+### 🎵 Music Production
+- Create karaoke tracks by removing vocals
+- Extract stems for remixing and sampling
+- Align songs for seamless DJ mixes
+- Generate medleys and mashups
+### 🎧 Audio Editing
+- Clean up noisy recordings
+- Extract specific sections for clips
+- Create ringtones and social media content
+- Repair damaged audio files
+### 🤖 AI Experimentation
+- Voice conversion for creative projects
+- Genre analysis and music understanding
+- Intelligent cutting point suggestions
+- Structure analysis for music theory
+## 🎉 Have Fun!
+This is a **demo playground** for exploring agents capabilities with audio processing.
+---
+**Built with ❤️ using cutting-edge AI models and open-source audio processing libraries**

mcp_server.py CHANGED Viewed

@@ -621,7 +621,7 @@ def separate_audio_mcp(
             try:
                 import psutil
-                available_gb = psutil.virtual_memory().available / (1024 ** 3)
                 if available_gb > 16:
                     segment = None  # Let Demucs decide
                 elif available_gb > 8:
@@ -1169,8 +1169,8 @@ def replace_voice_mcp(
     diffusion_steps: int = 35
     length_adjust: float = 1.0
     inference_cfg_rate: float = 0.5
-    f0_condition: bool = False
-    auto_f0_adjust: bool = False
     pitch_shift: int = 0
     return replace_voice_wrapper(
@@ -1185,16 +1185,16 @@ def replace_voice_mcp(
     )
-def create_interface() -> gr.TabbedInterface:
     """
     Create and configure the complete Gradio interface with all audio processing tools.
-    This function sets up a comprehensive web interface with 19 different tabs,
     each providing access to specific audio processing capabilities. The interface
-    is organized into logical categories for ease of use.
     Returns:
-        gr.TabbedInterface: A fully configured Gradio tabbed interface containing:
         **Stem Processing Tabs:**
         - Stem Separation: Full 4-stem separation (vocals, drums, bass, other)
@@ -1211,7 +1211,7 @@ def create_interface() -> gr.TabbedInterface:
         - Stereo Mix: Create stereo mix with left/right channels
         - Time Stretching: Change tempo without affecting pitch
         - BPM Alignment: Align two tracks to same BPM
-        - Medley Creation: Professional vocal/instrumental mixing
         **Audio Editing Tabs:**
         - Audio Cutting: Extract segments between time points
@@ -1237,11 +1237,11 @@ def create_interface() -> gr.TabbedInterface:
         - Server runs on 0.0.0.0:7860 with MCP server enabled
         - All examples disabled for security (cache_examples=False)
         - Flagging disabled to prevent data collection
     """
     # Tab 1: Stem Separation
     stem_interface = gr.Interface(
         fn=separate_audio_mcp,
         inputs=[
             gr.Audio(type="filepath", label="Upload Audio File", sources=["upload"]),
@@ -1692,7 +1692,7 @@ def create_interface() -> gr.TabbedInterface:
                 type="filepath",
                 label="Target Audio (voice to use) - Local file or URL",
                 sources=["upload"],
-            )
         ],
         outputs=gr.Audio(label="Voice-Replaced Audio", type="filepath"),
         title="Voice Replacement with Seed-VC",
@@ -1702,7 +1702,8 @@ def create_interface() -> gr.TabbedInterface:
         flagging_mode="never",
     )
-    return gr.TabbedInterface(
         [
             stem_interface,
             pitch_interface,
@@ -1753,8 +1754,74 @@ def create_interface() -> gr.TabbedInterface:
             "Replace Section",
             "Voice Replacement",
         ],
     )
 if __name__ == "__main__":
     interface = create_interface()

             try:
                 import psutil
+                available_gb = psutil.virtual_memory().available / (1024**3)
                 if available_gb > 16:
                     segment = None  # Let Demucs decide
                 elif available_gb > 8:
     diffusion_steps: int = 35
     length_adjust: float = 1.0
     inference_cfg_rate: float = 0.5
+    f0_condition: bool = True
+    auto_f0_adjust: bool = True
     pitch_shift: int = 0
     return replace_voice_wrapper(
     )
+def create_interface() -> gr.Blocks:
     """
     Create and configure the complete Gradio interface with all audio processing tools.
+    This function sets up a fun web interface with 25+ different tabs,
     each providing access to specific audio processing capabilities. The interface
+    is organized into logical categories for easy exploration and experimentation.
     Returns:
+        gr.Blocks: A fully configured Gradio interface containing:
         **Stem Processing Tabs:**
         - Stem Separation: Full 4-stem separation (vocals, drums, bass, other)
         - Stereo Mix: Create stereo mix with left/right channels
         - Time Stretching: Change tempo without affecting pitch
         - BPM Alignment: Align two tracks to same BPM
+        - Medley Creation: Fun vocal/instrumental mixing
         **Audio Editing Tabs:**
         - Audio Cutting: Extract segments between time points
         - Server runs on 0.0.0.0:7860 with MCP server enabled
         - All examples disabled for security (cache_examples=False)
         - Flagging disabled to prevent data collection
+        - This is a demo project for exploring audio processing capabilities
     """
     # Tab 1: Stem Separation
     stem_interface = gr.Interface(
         fn=separate_audio_mcp,
         inputs=[
             gr.Audio(type="filepath", label="Upload Audio File", sources=["upload"]),
                 type="filepath",
                 label="Target Audio (voice to use) - Local file or URL",
                 sources=["upload"],
+            ),
         ],
         outputs=gr.Audio(label="Voice-Replaced Audio", type="filepath"),
         title="Voice Replacement with Seed-VC",
         flagging_mode="never",
     )
+    # Create TabbedInterface with custom header
+    tabbed_interface = gr.TabbedInterface(
         [
             stem_interface,
             pitch_interface,
             "Replace Section",
             "Voice Replacement",
         ],
+        title="🎵 Music AI Tools - Professional Audio Processing Suite",
     )
+    # Add custom CSS for header styling
+    tabbed_interface.head = """
+    <style>
+        .gradio-container {
+            font-family: 'Inter', system-ui, -apple-system, sans-serif !important;
+        }
+        .tab-nav {
+            border-bottom: 2px solid #e5e7eb !important;
+        }
+        .tab-nav button {
+            font-weight: 500 !important;
+        }
+    </style>
+    """
+    # Add header HTML to the interface
+    header_html = """
+    <div style="text-align: center; padding: 30px 20px; background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); border-radius: 15px; margin: 20px auto; max-width: 1200px; box-shadow: 0 10px 30px rgba(0,0,0,0.2);">
+        <h1 style="color: white; font-size: 2.8em; margin-bottom: 15px; font-weight: 700; text-shadow: 2px 2px 4px rgba(0,0,0,0.3);">
+            🎵 Music AI Tools 🎶
+        </h1>
+        <h2 style="color: #f0f0f0; font-size: 1.4em; margin-bottom: 20px; font-weight: 400;">
+            Fun Audio Processing Playground
+        </h2>
+        <p style="color: #e0e0e0; font-size: 1.1em; max-width: 900px; margin: 0 auto 25px auto; line-height: 1.7; text-align: left; background: rgba(0,0,0,0.2); padding: 20px; border-radius: 10px;">
+            <strong style="color: white; font-size: 1.2em;">🎧 Cool Audio Tricks:</strong> Stem separation (Demucs), pitch shifting, time stretching, and key alignment<br>
+            <strong style="color: white; font-size: 1.2em;">🎹 Smart AI Analysis:</strong> Genre detection, structure analysis, and cutting suggestions (Music-Flamingo)<br>
+            <strong style="color: white; font-size: 1.2em;">🎛️ Fun Audio Editing:</strong> Noise removal, track combination, and precise audio manipulation<br>
+            <strong style="color: white; font-size: 1.2em;">🤖 Awesome AI Tools:</strong> Voice replacement (Seed-VC) and music understanding (Music-Flamingo)<br>
+            <strong style="color: white; font-size: 1.2em;">🚀 Fast & Powerful:</strong> GPU boost, parallel processing, and live progress updates
+        </p>
+        <div style="margin-top: 20px; display: flex; justify-content: center; gap: 10px; flex-wrap: wrap;">
+            <span style="background: rgba(255,255,255,0.25); padding: 8px 20px; border-radius: 25px; margin: 5px; color: white; font-weight: 600; backdrop-filter: blur(10px);">
+                🎼 25+ Tools
+            </span>
+            <span style="background: rgba(255,255,255,0.25); padding: 8px 20px; border-radius: 25px; margin: 5px; color: white; font-weight: 600; backdrop-filter: blur(10px);">
+                🎯 AI-Powered
+            </span>
+            <span style="background: rgba(255,255,255,0.25); padding: 8px 20px; border-radius: 25px; margin: 5px; color: white; font-weight: 600; backdrop-filter: blur(10px);">
+                🌐 URL Support
+            </span>
+            <span style="background: rgba(255,255,255,0.25); padding: 8px 20px; border-radius: 25px; margin: 5px; color: white; font-weight: 600; backdrop-filter: blur(10px);">
+                🎪 Demo Fun
+            </span>
+        </div>
+        <div style="margin-top: 25px; text-align: center; color: rgba(255,255,255,0.9); font-size: 0.9em; line-height: 1.6;">
+            <strong style="color: white;">🤖 AI Models Used:</strong><br>
+            🎵 <strong>Stem Separation:</strong> <a href="https://github.com/facebookresearch/demucs" target="_blank" style="color: #ffd700; text-decoration: underline;">Demucs</a> by Facebook Research<br>
+            🎤 <strong>Voice Replacement:</strong> <a href="https://huggingface.co/spaces/Plachta/Seed-VC" target="_blank" style="color: #ffd700; text-decoration: underline;">Seed-VC</a> on Hugging Face<br>
+            🧠 <strong>Music Understanding:</strong> <a href="https://github.com/NVIDIA/music-flamingo" target="_blank" style="color: #ffd700; text-decoration: underline;">Music-Flamingo</a> by NVIDIA<br>
+            <br>
+            <strong style="color: white;">🎛️ Audio Processing Libraries:</strong><br>
+            ⚙️ <strong>Audio Analysis:</strong> <a href="https://librosa.org/" target="_blank" style="color: #87ceeb; text-decoration: underline;">Librosa</a> for audio feature extraction<br>
+            🎬 <strong>Audio Conversion:</strong> <a href="https://ffmpeg.org/" target="_blank" style="color: #87ceeb; text-decoration: underline;">FFmpeg</a> for format conversion and processing
+        </div>
+    </div>
+    """
+    # Create a wrapper interface that includes the header
+    with gr.Blocks() as wrapper_interface:
+        gr.HTML(header_html)
+        tabbed_interface.render()
+    return wrapper_interface
 if __name__ == "__main__":
     interface = create_interface()