frascuchon HF Staff commited on
Commit
cf28ba5
·
1 Parent(s): 20de2d7

review UI and docs

Browse files
Files changed (2) hide show
  1. README.md +185 -3
  2. mcp_server.py +78 -11
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
- title: Music Mcp
3
- emoji: 👀
4
  colorFrom: indigo
5
  colorTo: pink
6
  sdk: gradio
@@ -9,4 +9,186 @@ app_file: mcp_server.py
9
  pinned: false
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Music AI Tools
3
+ emoji: 🎵🎶
4
  colorFrom: indigo
5
  colorTo: pink
6
  sdk: gradio
 
9
  pinned: false
10
  ---
11
 
12
+ # 🎵 Music AI Tools - Fun Audio Processing Playground
13
+
14
+ A comprehensive demo project showcasing 25+ audio processing tools powered by cutting-edge AI models and traditional audio processing libraries. This playground provides both web-based and MCP (Model Context Protocol) interfaces for exploring audio manipulation, analysis, and creative possibilities.
15
+
16
+ ## 🎯 What's Inside
17
+
18
+ ### 🤖 AI-Powered Features
19
+ - **🎵 Stem Separation** using [Demucs](https://github.com/facebookresearch/demucs) by Facebook Research
20
+ - **🎤 Voice Replacement** using [Seed-VC](https://huggingface.co/spaces/Plachta/Seed-VC) on Hugging Face
21
+ - **🧠 Music Understanding** using [Music-Flamingo](https://github.com/NVIDIA/music-flamingo) by NVIDIA
22
+
23
+ ### 🎛️ Audio Processing Capabilities
24
+ - **⚙️ Audio Analysis** with [Librosa](https://librosa.org/) for feature extraction
25
+ - **🎬 Audio Conversion** with [FFmpeg](https://ffmpeg.org/) for format processing
26
+ - **🚀 High Performance** with GPU acceleration and parallel processing
27
+
28
+ ## 🎪 Demo Features
29
+
30
+ ### Stem Processing Tools
31
+ - **Stem Separation** - Full 4-stem separation (vocals, drums, bass, other)
32
+ - **Selective Stems** - Extract only specific stems to save processing time
33
+ - **Vocal/Instrumental** - Separate vocals from instrumental components
34
+ - **Karaoke Creation** - One-click instrumental track generation
35
+
36
+ ### Audio Manipulation Tools
37
+ - **Pitch Alignment** - Shift audio pitch by semitones
38
+ - **Key Estimation** - Estimate musical key using harmonic analysis
39
+ - **Shift to Key** - Shift audio to specific musical key
40
+ - **Align Songs by Key** - Harmonically align multiple tracks
41
+ - **Time Stretching** - Change tempo without affecting pitch
42
+ - **BPM Alignment** - Align two tracks to same BPM
43
+ - **Medley Creation** - Fun vocal/instrumental mixing
44
+
45
+ ### Audio Editing Tools
46
+ - **Audio Cutting** - Extract segments between time points
47
+ - **Mute Windows** - Mute specific time ranges with smooth fades
48
+ - **Extract Segments** - Extract multiple segments with joining options
49
+ - **Trim Audio** - Trim from beginning/end with precision
50
+ - **Insert Section** - Insert audio sections at precise positions
51
+ - **Replace Section** - Replace audio segments with crossfades
52
+
53
+ ### Analysis & Information Tools
54
+ - **Audio Information** - Get detailed file information
55
+ - **Music Understanding** - AI-powered music analysis
56
+ - **Song Structure** - Identify song sections (verse, chorus, bridge)
57
+ - **Cutting Points** - AI-suggested optimal edit points
58
+ - **Genre Analysis** - Detailed genre and style analysis
59
+
60
+ ### Special Features
61
+ - **Voice Replacement** - Replace voice using Seed-VC AI model
62
+ - **Audio Cleaning** - Remove noise (hiss, hum, background)
63
+ - **YouTube Extraction** - Extract audio from YouTube videos
64
+
65
+ ## 🚀 Quick Start
66
+
67
+ ### Prerequisites
68
+ ```bash
69
+ # Install dependencies
70
+ pip install -r requirements.txt
71
+
72
+ # For GPU acceleration (optional but recommended)
73
+ pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
74
+ ```
75
+
76
+ ### Running the Demo
77
+
78
+ #### Web Interface (Recommended for Demo)
79
+ ```bash
80
+ python mcp_server.py
81
+ ```
82
+ Then open http://localhost:7860 in your browser to access the fun playground interface with 25+ tools!
83
+
84
+ #### MCP Server Mode
85
+ ```bash
86
+ python mcp_server.py --mcp
87
+ ```
88
+ Run as MCP server for integration with AI assistants and other tools.
89
+
90
+ ## 🎮 Using the Tools
91
+
92
+ ### Web Interface
93
+ 1. **Upload Audio** - Drag & drop or browse for audio files (WAV, MP3, FLAC, M4A)
94
+ 2. **Select Tool** - Choose from 25+ different audio processing tools
95
+ 3. **Configure Settings** - Adjust parameters for each tool
96
+ 4. **Process & Download** - Get results instantly with real-time progress
97
+
98
+ ### Supported Formats
99
+ - **Input:** WAV, MP3, FLAC, M4A
100
+ - **Output:** WAV, MP3 (configurable)
101
+ - **URL Support:** Direct processing from YouTube and other URLs
102
+
103
+ ## 🛠️ Project Structure
104
+
105
+ ```
106
+ music-mcp/
107
+ ├── mcp_server.py # Main server with Gradio interface
108
+ ├── requirements.txt # Python dependencies
109
+ ├── tools/ # Audio processing modules
110
+ │ ├── stems_separation.py # Demucs-based stem separation
111
+ │ ├── voice_replacement.py # Seed-VC voice conversion
112
+ │ ├── music_understanding.py # Music-Flamingo AI analysis
113
+ │ ├── pitch_alignment.py # Key detection and pitch shifting
114
+ │ ├── time_strech.py # BPM alignment and time stretching
115
+ │ ├── audio_cutting.py # Audio editing and manipulation
116
+ │ ├── audio_cleaning.py # Noise removal and cleaning
117
+ │ ├── combine_tracks.py # Track mixing and medley creation
118
+ │ ├── audio_info.py # File information and validation
119
+ │ └── youtube_extract.py # YouTube audio extraction
120
+ ├── examples/ # Sample audio files for testing
121
+ ├── output/ # Generated audio files
122
+ └── youtube_downloads/ # Cached YouTube downloads
123
+ ```
124
+
125
+ ## 🎯 AI Model Details
126
+
127
+ ### 🤖 AI Models Used
128
+ - **Demucs** (Facebook Research) - State-of-the-art source separation
129
+ - **Seed-VC** (Hugging Face) - High-quality voice conversion
130
+ - **Music-Flamingo** (NVIDIA) - Advanced music understanding and analysis
131
+
132
+ ### 🎛️ Audio Processing Libraries
133
+ - **Librosa** - Audio feature extraction and analysis
134
+ - **FFmpeg** - Audio format conversion and processing
135
+ - **PyTorch** - Deep learning framework for AI models
136
+
137
+
138
+ ## 🎨 Customization
139
+
140
+ ### Adding New Tools
141
+ 1. Create new function in appropriate `tools/` module
142
+ 2. Add wrapper function with MCP compatibility
143
+ 3. Register in `mcp_server.py` interface creation
144
+ 4. Update documentation
145
+
146
+ ## 🔧 Development
147
+
148
+ ### Code Quality
149
+ ```bash
150
+ # Linting
151
+ ruff check .
152
+
153
+ # Formatting
154
+ ruff format .
155
+
156
+ # Type checking
157
+ mypy . --follow-untyped-imports
158
+ ```
159
+
160
+ ### Dependencies
161
+ - **Core:** gradio, torch, librosa, soundfile
162
+ - **AI Models:** demucs, transformers
163
+ - **Audio Processing:** ffmpeg-python, numpy, scipy
164
+ - **Web:** yt-dlp, requests, gradio-client
165
+
166
+ ## 🎪 Demo Use Cases
167
+
168
+ ### 🎵 Music Production
169
+ - Create karaoke tracks by removing vocals
170
+ - Extract stems for remixing and sampling
171
+ - Align songs for seamless DJ mixes
172
+ - Generate medleys and mashups
173
+
174
+ ### 🎧 Audio Editing
175
+ - Clean up noisy recordings
176
+ - Extract specific sections for clips
177
+ - Create ringtones and social media content
178
+ - Repair damaged audio files
179
+
180
+ ### 🤖 AI Experimentation
181
+ - Voice conversion for creative projects
182
+ - Genre analysis and music understanding
183
+ - Intelligent cutting point suggestions
184
+ - Structure analysis for music theory
185
+
186
+
187
+
188
+ ## 🎉 Have Fun!
189
+
190
+ This is a **demo playground** for exploring agents capabilities with audio processing.
191
+
192
+ ---
193
+
194
+ **Built with ❤️ using cutting-edge AI models and open-source audio processing libraries**
mcp_server.py CHANGED
@@ -621,7 +621,7 @@ def separate_audio_mcp(
621
  try:
622
  import psutil
623
 
624
- available_gb = psutil.virtual_memory().available / (1024 ** 3)
625
  if available_gb > 16:
626
  segment = None # Let Demucs decide
627
  elif available_gb > 8:
@@ -1169,8 +1169,8 @@ def replace_voice_mcp(
1169
  diffusion_steps: int = 35
1170
  length_adjust: float = 1.0
1171
  inference_cfg_rate: float = 0.5
1172
- f0_condition: bool = False
1173
- auto_f0_adjust: bool = False
1174
  pitch_shift: int = 0
1175
 
1176
  return replace_voice_wrapper(
@@ -1185,16 +1185,16 @@ def replace_voice_mcp(
1185
  )
1186
 
1187
 
1188
- def create_interface() -> gr.TabbedInterface:
1189
  """
1190
  Create and configure the complete Gradio interface with all audio processing tools.
1191
 
1192
- This function sets up a comprehensive web interface with 19 different tabs,
1193
  each providing access to specific audio processing capabilities. The interface
1194
- is organized into logical categories for ease of use.
1195
 
1196
  Returns:
1197
- gr.TabbedInterface: A fully configured Gradio tabbed interface containing:
1198
 
1199
  **Stem Processing Tabs:**
1200
  - Stem Separation: Full 4-stem separation (vocals, drums, bass, other)
@@ -1211,7 +1211,7 @@ def create_interface() -> gr.TabbedInterface:
1211
  - Stereo Mix: Create stereo mix with left/right channels
1212
  - Time Stretching: Change tempo without affecting pitch
1213
  - BPM Alignment: Align two tracks to same BPM
1214
- - Medley Creation: Professional vocal/instrumental mixing
1215
 
1216
  **Audio Editing Tabs:**
1217
  - Audio Cutting: Extract segments between time points
@@ -1237,11 +1237,11 @@ def create_interface() -> gr.TabbedInterface:
1237
  - Server runs on 0.0.0.0:7860 with MCP server enabled
1238
  - All examples disabled for security (cache_examples=False)
1239
  - Flagging disabled to prevent data collection
 
1240
  """
1241
 
1242
  # Tab 1: Stem Separation
1243
  stem_interface = gr.Interface(
1244
-
1245
  fn=separate_audio_mcp,
1246
  inputs=[
1247
  gr.Audio(type="filepath", label="Upload Audio File", sources=["upload"]),
@@ -1692,7 +1692,7 @@ def create_interface() -> gr.TabbedInterface:
1692
  type="filepath",
1693
  label="Target Audio (voice to use) - Local file or URL",
1694
  sources=["upload"],
1695
- )
1696
  ],
1697
  outputs=gr.Audio(label="Voice-Replaced Audio", type="filepath"),
1698
  title="Voice Replacement with Seed-VC",
@@ -1702,7 +1702,8 @@ def create_interface() -> gr.TabbedInterface:
1702
  flagging_mode="never",
1703
  )
1704
 
1705
- return gr.TabbedInterface(
 
1706
  [
1707
  stem_interface,
1708
  pitch_interface,
@@ -1753,8 +1754,74 @@ def create_interface() -> gr.TabbedInterface:
1753
  "Replace Section",
1754
  "Voice Replacement",
1755
  ],
 
1756
  )
1757
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1758
 
1759
  if __name__ == "__main__":
1760
  interface = create_interface()
 
621
  try:
622
  import psutil
623
 
624
+ available_gb = psutil.virtual_memory().available / (1024**3)
625
  if available_gb > 16:
626
  segment = None # Let Demucs decide
627
  elif available_gb > 8:
 
1169
  diffusion_steps: int = 35
1170
  length_adjust: float = 1.0
1171
  inference_cfg_rate: float = 0.5
1172
+ f0_condition: bool = True
1173
+ auto_f0_adjust: bool = True
1174
  pitch_shift: int = 0
1175
 
1176
  return replace_voice_wrapper(
 
1185
  )
1186
 
1187
 
1188
+ def create_interface() -> gr.Blocks:
1189
  """
1190
  Create and configure the complete Gradio interface with all audio processing tools.
1191
 
1192
+ This function sets up a fun web interface with 25+ different tabs,
1193
  each providing access to specific audio processing capabilities. The interface
1194
+ is organized into logical categories for easy exploration and experimentation.
1195
 
1196
  Returns:
1197
+ gr.Blocks: A fully configured Gradio interface containing:
1198
 
1199
  **Stem Processing Tabs:**
1200
  - Stem Separation: Full 4-stem separation (vocals, drums, bass, other)
 
1211
  - Stereo Mix: Create stereo mix with left/right channels
1212
  - Time Stretching: Change tempo without affecting pitch
1213
  - BPM Alignment: Align two tracks to same BPM
1214
+ - Medley Creation: Fun vocal/instrumental mixing
1215
 
1216
  **Audio Editing Tabs:**
1217
  - Audio Cutting: Extract segments between time points
 
1237
  - Server runs on 0.0.0.0:7860 with MCP server enabled
1238
  - All examples disabled for security (cache_examples=False)
1239
  - Flagging disabled to prevent data collection
1240
+ - This is a demo project for exploring audio processing capabilities
1241
  """
1242
 
1243
  # Tab 1: Stem Separation
1244
  stem_interface = gr.Interface(
 
1245
  fn=separate_audio_mcp,
1246
  inputs=[
1247
  gr.Audio(type="filepath", label="Upload Audio File", sources=["upload"]),
 
1692
  type="filepath",
1693
  label="Target Audio (voice to use) - Local file or URL",
1694
  sources=["upload"],
1695
+ ),
1696
  ],
1697
  outputs=gr.Audio(label="Voice-Replaced Audio", type="filepath"),
1698
  title="Voice Replacement with Seed-VC",
 
1702
  flagging_mode="never",
1703
  )
1704
 
1705
+ # Create TabbedInterface with custom header
1706
+ tabbed_interface = gr.TabbedInterface(
1707
  [
1708
  stem_interface,
1709
  pitch_interface,
 
1754
  "Replace Section",
1755
  "Voice Replacement",
1756
  ],
1757
+ title="🎵 Music AI Tools - Professional Audio Processing Suite",
1758
  )
1759
 
1760
+ # Add custom CSS for header styling
1761
+ tabbed_interface.head = """
1762
+ <style>
1763
+ .gradio-container {
1764
+ font-family: 'Inter', system-ui, -apple-system, sans-serif !important;
1765
+ }
1766
+ .tab-nav {
1767
+ border-bottom: 2px solid #e5e7eb !important;
1768
+ }
1769
+ .tab-nav button {
1770
+ font-weight: 500 !important;
1771
+ }
1772
+ </style>
1773
+ """
1774
+
1775
+ # Add header HTML to the interface
1776
+ header_html = """
1777
+ <div style="text-align: center; padding: 30px 20px; background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); border-radius: 15px; margin: 20px auto; max-width: 1200px; box-shadow: 0 10px 30px rgba(0,0,0,0.2);">
1778
+ <h1 style="color: white; font-size: 2.8em; margin-bottom: 15px; font-weight: 700; text-shadow: 2px 2px 4px rgba(0,0,0,0.3);">
1779
+ 🎵 Music AI Tools 🎶
1780
+ </h1>
1781
+ <h2 style="color: #f0f0f0; font-size: 1.4em; margin-bottom: 20px; font-weight: 400;">
1782
+ Fun Audio Processing Playground
1783
+ </h2>
1784
+ <p style="color: #e0e0e0; font-size: 1.1em; max-width: 900px; margin: 0 auto 25px auto; line-height: 1.7; text-align: left; background: rgba(0,0,0,0.2); padding: 20px; border-radius: 10px;">
1785
+ <strong style="color: white; font-size: 1.2em;">🎧 Cool Audio Tricks:</strong> Stem separation (Demucs), pitch shifting, time stretching, and key alignment<br>
1786
+ <strong style="color: white; font-size: 1.2em;">🎹 Smart AI Analysis:</strong> Genre detection, structure analysis, and cutting suggestions (Music-Flamingo)<br>
1787
+ <strong style="color: white; font-size: 1.2em;">🎛️ Fun Audio Editing:</strong> Noise removal, track combination, and precise audio manipulation<br>
1788
+ <strong style="color: white; font-size: 1.2em;">🤖 Awesome AI Tools:</strong> Voice replacement (Seed-VC) and music understanding (Music-Flamingo)<br>
1789
+ <strong style="color: white; font-size: 1.2em;">🚀 Fast & Powerful:</strong> GPU boost, parallel processing, and live progress updates
1790
+ </p>
1791
+ <div style="margin-top: 20px; display: flex; justify-content: center; gap: 10px; flex-wrap: wrap;">
1792
+ <span style="background: rgba(255,255,255,0.25); padding: 8px 20px; border-radius: 25px; margin: 5px; color: white; font-weight: 600; backdrop-filter: blur(10px);">
1793
+ 🎼 25+ Tools
1794
+ </span>
1795
+ <span style="background: rgba(255,255,255,0.25); padding: 8px 20px; border-radius: 25px; margin: 5px; color: white; font-weight: 600; backdrop-filter: blur(10px);">
1796
+ 🎯 AI-Powered
1797
+ </span>
1798
+ <span style="background: rgba(255,255,255,0.25); padding: 8px 20px; border-radius: 25px; margin: 5px; color: white; font-weight: 600; backdrop-filter: blur(10px);">
1799
+ 🌐 URL Support
1800
+ </span>
1801
+ <span style="background: rgba(255,255,255,0.25); padding: 8px 20px; border-radius: 25px; margin: 5px; color: white; font-weight: 600; backdrop-filter: blur(10px);">
1802
+ 🎪 Demo Fun
1803
+ </span>
1804
+ </div>
1805
+ <div style="margin-top: 25px; text-align: center; color: rgba(255,255,255,0.9); font-size: 0.9em; line-height: 1.6;">
1806
+ <strong style="color: white;">🤖 AI Models Used:</strong><br>
1807
+ 🎵 <strong>Stem Separation:</strong> <a href="https://github.com/facebookresearch/demucs" target="_blank" style="color: #ffd700; text-decoration: underline;">Demucs</a> by Facebook Research<br>
1808
+ 🎤 <strong>Voice Replacement:</strong> <a href="https://huggingface.co/spaces/Plachta/Seed-VC" target="_blank" style="color: #ffd700; text-decoration: underline;">Seed-VC</a> on Hugging Face<br>
1809
+ 🧠 <strong>Music Understanding:</strong> <a href="https://github.com/NVIDIA/music-flamingo" target="_blank" style="color: #ffd700; text-decoration: underline;">Music-Flamingo</a> by NVIDIA<br>
1810
+ <br>
1811
+ <strong style="color: white;">🎛️ Audio Processing Libraries:</strong><br>
1812
+ ⚙️ <strong>Audio Analysis:</strong> <a href="https://librosa.org/" target="_blank" style="color: #87ceeb; text-decoration: underline;">Librosa</a> for audio feature extraction<br>
1813
+ 🎬 <strong>Audio Conversion:</strong> <a href="https://ffmpeg.org/" target="_blank" style="color: #87ceeb; text-decoration: underline;">FFmpeg</a> for format conversion and processing
1814
+ </div>
1815
+ </div>
1816
+ """
1817
+
1818
+ # Create a wrapper interface that includes the header
1819
+ with gr.Blocks() as wrapper_interface:
1820
+ gr.HTML(header_html)
1821
+ tabbed_interface.render()
1822
+
1823
+ return wrapper_interface
1824
+
1825
 
1826
  if __name__ == "__main__":
1827
  interface = create_interface()