music-mcp / README.md
frascuchon's picture
frascuchon HF Staff
fix links
b9f0faf

A newer version of the Gradio SDK is available: 6.1.0

Upgrade
metadata
title: Music AI Tools
emoji: ๐ŸŽต๐ŸŽถ
colorFrom: indigo
colorTo: pink
sdk: gradio
sdk_version: 5.49.1
app_file: mcp_server.py
pinned: false

๐ŸŽต Music AI Tools - Fun Audio Processing Playground

A comprehensive demo project showcasing 25+ audio processing tools powered by cutting-edge AI models and traditional audio processing libraries. This playground provides both web-based and MCP (Model Context Protocol) interfaces for exploring audio manipulation, analysis, and creative possibilities.

๐ŸŽฏ What's Inside

๐Ÿค– AI-Powered Features

  • ๐ŸŽต Stem Separation using Demucs by Facebook Research
  • ๐ŸŽค Voice Replacement using Seed-VC on Hugging Face
  • ๐Ÿง  Music Understanding using Music-Flamingo by NVIDIA

๐ŸŽ›๏ธ Audio Processing Capabilities

  • โš™๏ธ Audio Analysis with Librosa for feature extraction
  • ๐ŸŽฌ Audio Conversion with FFmpeg for format processing
  • ๐Ÿš€ High Performance with GPU acceleration and parallel processing

๐ŸŽช Demo Features

Stem Processing Tools

  • Stem Separation - Full 4-stem separation (vocals, drums, bass, other)
  • Selective Stems - Extract only specific stems to save processing time
  • Vocal/Instrumental - Separate vocals from instrumental components
  • Karaoke Creation - One-click instrumental track generation

Audio Manipulation Tools

  • Pitch Alignment - Shift audio pitch by semitones
  • Key Estimation - Estimate musical key using harmonic analysis
  • Shift to Key - Shift audio to specific musical key
  • Align Songs by Key - Harmonically align multiple tracks
  • Time Stretching - Change tempo without affecting pitch
  • BPM Alignment - Align two tracks to same BPM
  • Medley Creation - Fun vocal/instrumental mixing

Audio Editing Tools

  • Audio Cutting - Extract segments between time points
  • Mute Windows - Mute specific time ranges with smooth fades
  • Extract Segments - Extract multiple segments with joining options
  • Trim Audio - Trim from beginning/end with precision
  • Insert Section - Insert audio sections at precise positions
  • Replace Section - Replace audio segments with crossfades

Analysis & Information Tools

  • Audio Information - Get detailed file information
  • Music Understanding - AI-powered music analysis
  • Song Structure - Identify song sections (verse, chorus, bridge)
  • Cutting Points - AI-suggested optimal edit points
  • Genre Analysis - Detailed genre and style analysis

Special Features

  • Voice Replacement - Replace voice using Seed-VC AI model
  • Audio Cleaning - Remove noise (hiss, hum, background)
  • YouTube Extraction - Extract audio from YouTube videos

๐Ÿš€ Quick Start

Prerequisites

# Install dependencies
pip install -r requirements.txt

# For GPU acceleration (optional but recommended)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Running the Demo

Web Interface (Recommended for Demo)

python mcp_server.py

Then open http://localhost:7860 in your browser to access the fun playground interface with 25+ tools!

MCP Server Mode

python mcp_server.py --mcp

Run as MCP server for integration with AI assistants and other tools.

๐ŸŽฎ Using the Tools

Web Interface

  1. Upload Audio - Drag & drop or browse for audio files (WAV, MP3, FLAC, M4A)
  2. Select Tool - Choose from 25+ different audio processing tools
  3. Configure Settings - Adjust parameters for each tool
  4. Process & Download - Get results instantly with real-time progress

Supported Formats

  • Input: WAV, MP3, FLAC, M4A
  • Output: WAV, MP3 (configurable)
  • URL Support: Direct processing from YouTube and other URLs

๐Ÿ› ๏ธ Project Structure

music-mcp/
โ”œโ”€โ”€ mcp_server.py              # Main server with Gradio interface
โ”œโ”€โ”€ requirements.txt           # Python dependencies
โ”œโ”€โ”€ tools/                     # Audio processing modules
โ”‚   โ”œโ”€โ”€ stems_separation.py    # Demucs-based stem separation
โ”‚   โ”œโ”€โ”€ voice_replacement.py   # Seed-VC voice conversion
โ”‚   โ”œโ”€โ”€ music_understanding.py  # Music-Flamingo AI analysis
โ”‚   โ”œโ”€โ”€ pitch_alignment.py     # Key detection and pitch shifting
โ”‚   โ”œโ”€โ”€ time_strech.py        # BPM alignment and time stretching
โ”‚   โ”œโ”€โ”€ audio_cutting.py      # Audio editing and manipulation
โ”‚   โ”œโ”€โ”€ audio_cleaning.py     # Noise removal and cleaning
โ”‚   โ”œโ”€โ”€ combine_tracks.py      # Track mixing and medley creation
โ”‚   โ”œโ”€โ”€ audio_info.py         # File information and validation
โ”‚   โ””โ”€โ”€ youtube_extract.py   # YouTube audio extraction
โ”œโ”€โ”€ examples/                 # Sample audio files for testing
โ”œโ”€โ”€ output/                  # Generated audio files
โ””โ”€โ”€ youtube_downloads/        # Cached YouTube downloads

๐ŸŽฏ AI Model Details

๐Ÿค– AI Models Used

  • Demucs (Facebook Research) - State-of-the-art source separation
  • Seed-VC (Hugging Face) - High-quality voice conversion
  • Music-Flamingo (NVIDIA) - Advanced music understanding and analysis

๐ŸŽ›๏ธ Audio Processing Libraries

  • Librosa - Audio feature extraction and analysis
  • FFmpeg - Audio format conversion and processing
  • PyTorch - Deep learning framework for AI models

๐ŸŽจ Customization

Adding New Tools

  1. Create new function in appropriate tools/ module
  2. Add wrapper function with MCP compatibility
  3. Register in mcp_server.py interface creation
  4. Update documentation

๐Ÿ”ง Development

Code Quality

# Linting
ruff check .

# Formatting
ruff format .

# Type checking
mypy . --follow-untyped-imports

Dependencies

  • Core: gradio, torch, librosa, soundfile
  • AI Models: demucs, transformers
  • Audio Processing: ffmpeg-python, numpy, scipy
  • Web: yt-dlp, requests, gradio-client

๐ŸŽช Demo Use Cases

๐ŸŽต Music Production

  • Create karaoke tracks by removing vocals
  • Extract stems for remixing and sampling
  • Align songs for seamless DJ mixes
  • Generate medleys and mashups

๐ŸŽง Audio Editing

  • Clean up noisy recordings
  • Extract specific sections for clips
  • Create ringtones and social media content
  • Repair damaged audio files

๐Ÿค– AI Experimentation

  • Voice conversion for creative projects
  • Genre analysis and music understanding
  • Intelligent cutting point suggestions
  • Structure analysis for music theory

๐ŸŽ‰ Have Fun!

This is a demo playground for exploring agents capabilities with audio processing.


Built with โค๏ธ using cutting-edge AI models and open-source audio processing libraries