Spaces:

frascuchon
/

music-mcp

Running on CPU Upgrade

App Files Files Community

music-mcp / README.md

frascuchon HF Staff

fix links

b9f0faf 11 days ago

preview code

raw

history blame contribute delete

6.89 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

metadata

title: Music AI Tools
emoji: 🎵🎶
colorFrom: indigo
colorTo: pink
sdk: gradio
sdk_version: 5.49.1
app_file: mcp_server.py
pinned: false

🎵 Music AI Tools - Fun Audio Processing Playground

A comprehensive demo project showcasing 25+ audio processing tools powered by cutting-edge AI models and traditional audio processing libraries. This playground provides both web-based and MCP (Model Context Protocol) interfaces for exploring audio manipulation, analysis, and creative possibilities.

🎯 What's Inside

🤖 AI-Powered Features

🎵 Stem Separation using Demucs by Facebook Research
🎤 Voice Replacement using Seed-VC on Hugging Face
🧠 Music Understanding using Music-Flamingo by NVIDIA

🎛️ Audio Processing Capabilities

⚙️ Audio Analysis with Librosa for feature extraction
🎬 Audio Conversion with FFmpeg for format processing
🚀 High Performance with GPU acceleration and parallel processing

🎪 Demo Features

Stem Processing Tools

Stem Separation - Full 4-stem separation (vocals, drums, bass, other)
Selective Stems - Extract only specific stems to save processing time
Vocal/Instrumental - Separate vocals from instrumental components
Karaoke Creation - One-click instrumental track generation

Audio Manipulation Tools

Pitch Alignment - Shift audio pitch by semitones
Key Estimation - Estimate musical key using harmonic analysis
Shift to Key - Shift audio to specific musical key
Align Songs by Key - Harmonically align multiple tracks
Time Stretching - Change tempo without affecting pitch
BPM Alignment - Align two tracks to same BPM
Medley Creation - Fun vocal/instrumental mixing

Audio Editing Tools

Audio Cutting - Extract segments between time points
Mute Windows - Mute specific time ranges with smooth fades
Extract Segments - Extract multiple segments with joining options
Trim Audio - Trim from beginning/end with precision
Insert Section - Insert audio sections at precise positions
Replace Section - Replace audio segments with crossfades

Analysis & Information Tools

Audio Information - Get detailed file information
Music Understanding - AI-powered music analysis
Song Structure - Identify song sections (verse, chorus, bridge)
Cutting Points - AI-suggested optimal edit points
Genre Analysis - Detailed genre and style analysis

Special Features

Voice Replacement - Replace voice using Seed-VC AI model
Audio Cleaning - Remove noise (hiss, hum, background)
YouTube Extraction - Extract audio from YouTube videos

🚀 Quick Start

Prerequisites

# Install dependencies
pip install -r requirements.txt

# For GPU acceleration (optional but recommended)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Running the Demo

Web Interface (Recommended for Demo)

python mcp_server.py

Then open http://localhost:7860 in your browser to access the fun playground interface with 25+ tools!

MCP Server Mode

python mcp_server.py --mcp

Run as MCP server for integration with AI assistants and other tools.

🎮 Using the Tools

Web Interface

Upload Audio - Drag & drop or browse for audio files (WAV, MP3, FLAC, M4A)
Select Tool - Choose from 25+ different audio processing tools
Configure Settings - Adjust parameters for each tool
Process & Download - Get results instantly with real-time progress

Supported Formats

Input: WAV, MP3, FLAC, M4A
Output: WAV, MP3 (configurable)
URL Support: Direct processing from YouTube and other URLs

🛠️ Project Structure

music-mcp/
├── mcp_server.py              # Main server with Gradio interface
├── requirements.txt           # Python dependencies
├── tools/                     # Audio processing modules
│   ├── stems_separation.py    # Demucs-based stem separation
│   ├── voice_replacement.py   # Seed-VC voice conversion
│   ├── music_understanding.py  # Music-Flamingo AI analysis
│   ├── pitch_alignment.py     # Key detection and pitch shifting
│   ├── time_strech.py        # BPM alignment and time stretching
│   ├── audio_cutting.py      # Audio editing and manipulation
│   ├── audio_cleaning.py     # Noise removal and cleaning
│   ├── combine_tracks.py      # Track mixing and medley creation
│   ├── audio_info.py         # File information and validation
│   └── youtube_extract.py   # YouTube audio extraction
├── examples/                 # Sample audio files for testing
├── output/                  # Generated audio files
└── youtube_downloads/        # Cached YouTube downloads

🎯 AI Model Details

🤖 AI Models Used

Demucs (Facebook Research) - State-of-the-art source separation
Seed-VC (Hugging Face) - High-quality voice conversion
Music-Flamingo (NVIDIA) - Advanced music understanding and analysis

🎛️ Audio Processing Libraries

Librosa - Audio feature extraction and analysis
FFmpeg - Audio format conversion and processing
PyTorch - Deep learning framework for AI models

🎨 Customization

Adding New Tools

Create new function in appropriate tools/ module
Add wrapper function with MCP compatibility
Register in mcp_server.py interface creation
Update documentation

🔧 Development

Code Quality

# Linting
ruff check .

# Formatting
ruff format .

# Type checking
mypy . --follow-untyped-imports

Dependencies

Core: gradio, torch, librosa, soundfile
AI Models: demucs, transformers
Audio Processing: ffmpeg-python, numpy, scipy
Web: yt-dlp, requests, gradio-client

🎪 Demo Use Cases

🎵 Music Production

Create karaoke tracks by removing vocals
Extract stems for remixing and sampling
Align songs for seamless DJ mixes
Generate medleys and mashups

🎧 Audio Editing

Clean up noisy recordings
Extract specific sections for clips
Create ringtones and social media content
Repair damaged audio files

🤖 AI Experimentation

Voice conversion for creative projects
Genre analysis and music understanding
Intelligent cutting point suggestions
Structure analysis for music theory

🎉 Have Fun!

This is a demo playground for exploring agents capabilities with audio processing.

Built with ❤️ using cutting-edge AI models and open-source audio processing libraries