Spaces:
Paused
Paused
refactor: rename Voice Profiler to Voice Tools throughout codebase
Browse filesUpdate all references from 'Voice Profiler' to 'Voice Tools':
- CLI command headers and descriptions
- Web interface title and description
- Benchmark script output
- Package docstrings
- remove .space and .spacesrc (not needed for HF Spaces deployment)
- update benchmark.py docstring to Voice Tools
This aligns with the project rename completed in previous commits.
- .space/README.md +0 -45
- .spacesrc +0 -2
- README.md +6 -6
- app.py +2 -2
- pyproject.toml +1 -1
- scripts/benchmark.py +3 -3
- src/cli/__init__.py +1 -1
- src/cli/denoise.py +1 -1
- src/cli/extract_speaker.py +1 -1
- src/cli/main.py +7 -7
- src/cli/separate.py +1 -1
- src/web/__init__.py +1 -1
- src/web/app.py +5 -5
.space/README.md
DELETED
|
@@ -1,45 +0,0 @@
|
|
| 1 |
-
---
|
| 2 |
-
title: Voice Profiler
|
| 3 |
-
emoji: 🎤
|
| 4 |
-
colorFrom: blue
|
| 5 |
-
colorTo: purple
|
| 6 |
-
sdk: gradio
|
| 7 |
-
sdk_version: 5.49.1
|
| 8 |
-
app_file: app.py
|
| 9 |
-
pinned: false
|
| 10 |
-
license: mit
|
| 11 |
-
hardware: zero-gpu
|
| 12 |
-
---
|
| 13 |
-
|
| 14 |
-
# Voice Profiler
|
| 15 |
-
|
| 16 |
-
AI-powered voice separation, extraction, and denoising tool.
|
| 17 |
-
|
| 18 |
-
## Features
|
| 19 |
-
|
| 20 |
-
- **Speaker Separation**: Automatically separate multiple speakers from mixed audio
|
| 21 |
-
- **Speaker Extraction**: Extract a specific speaker using a reference clip
|
| 22 |
-
- **Voice Denoising**: Remove background noise and silence from audio
|
| 23 |
-
|
| 24 |
-
## Technology
|
| 25 |
-
|
| 26 |
-
Powered by:
|
| 27 |
-
- PyAnnote Audio for speaker diarization and embeddings
|
| 28 |
-
- Silero VAD for voice activity detection
|
| 29 |
-
- HuggingFace ZeroGPU for fast GPU-accelerated processing
|
| 30 |
-
|
| 31 |
-
## Usage
|
| 32 |
-
|
| 33 |
-
1. Select a workflow from the tabs
|
| 34 |
-
2. Upload your audio file
|
| 35 |
-
3. Configure settings (optional)
|
| 36 |
-
4. Click "Process" and wait for results
|
| 37 |
-
|
| 38 |
-
## Requirements
|
| 39 |
-
|
| 40 |
-
- Audio files in M4A, WAV, or MP3 format
|
| 41 |
-
- For speaker extraction, provide a clean reference clip (minimum 3 seconds)
|
| 42 |
-
|
| 43 |
-
## License
|
| 44 |
-
|
| 45 |
-
MIT License - See LICENSE file for details
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.spacesrc
DELETED
|
@@ -1,2 +0,0 @@
|
|
| 1 |
-
#!/bin/bash
|
| 2 |
-
pip install -e .
|
|
|
|
|
|
|
|
|
README.md
CHANGED
|
@@ -1,5 +1,5 @@
|
|
| 1 |
---
|
| 2 |
-
title: Voice
|
| 3 |
emoji: 🎤
|
| 4 |
colorFrom: blue
|
| 5 |
colorTo: purple
|
|
@@ -11,11 +11,11 @@ license: mit
|
|
| 11 |
hardware: zero-gpu
|
| 12 |
---
|
| 13 |
|
| 14 |
-
# Voice
|
| 15 |
|
| 16 |
**Extract target voice from mixed audio files for video generation**
|
| 17 |
|
| 18 |
-
Voice
|
| 19 |
|
| 20 |
## Features
|
| 21 |
|
|
@@ -72,7 +72,7 @@ sudo apt-get install ffmpeg
|
|
| 72 |
**Windows**:
|
| 73 |
Download from [ffmpeg.org](https://ffmpeg.org/download.html)
|
| 74 |
|
| 75 |
-
### 2. Install Voice
|
| 76 |
|
| 77 |
```bash
|
| 78 |
# Clone the repository
|
|
@@ -107,7 +107,7 @@ huggingface-cli login
|
|
| 107 |
|
| 108 |
### Web Interface (Recommended for Beginners)
|
| 109 |
|
| 110 |
-
The easiest way to use Voice
|
| 111 |
|
| 112 |
```bash
|
| 113 |
voice-tools web
|
|
@@ -235,7 +235,7 @@ voice-tools scan input.m4a
|
|
| 235 |
|
| 236 |
## HuggingFace Spaces Deployment
|
| 237 |
|
| 238 |
-
Voice
|
| 239 |
|
| 240 |
### Prerequisites
|
| 241 |
|
|
|
|
| 1 |
---
|
| 2 |
+
title: Voice Tools
|
| 3 |
emoji: 🎤
|
| 4 |
colorFrom: blue
|
| 5 |
colorTo: purple
|
|
|
|
| 11 |
hardware: zero-gpu
|
| 12 |
---
|
| 13 |
|
| 14 |
+
# Voice Tools
|
| 15 |
|
| 16 |
**Extract target voice from mixed audio files for video generation**
|
| 17 |
|
| 18 |
+
Voice Tools is a tool that extracts a specific person's voice (speech and nonverbal sounds) from audio files containing background noise, music, and other speakers. It uses open-source AI models running locally on CPU to identify and isolate your target voice.
|
| 19 |
|
| 20 |
## Features
|
| 21 |
|
|
|
|
| 72 |
**Windows**:
|
| 73 |
Download from [ffmpeg.org](https://ffmpeg.org/download.html)
|
| 74 |
|
| 75 |
+
### 2. Install Voice Tools
|
| 76 |
|
| 77 |
```bash
|
| 78 |
# Clone the repository
|
|
|
|
| 107 |
|
| 108 |
### Web Interface (Recommended for Beginners)
|
| 109 |
|
| 110 |
+
The easiest way to use Voice Tools is through the web interface:
|
| 111 |
|
| 112 |
```bash
|
| 113 |
voice-tools web
|
|
|
|
| 235 |
|
| 236 |
## HuggingFace Spaces Deployment
|
| 237 |
|
| 238 |
+
Voice Tools supports deployment to HuggingFace Spaces with GPU acceleration using ZeroGPU. This provides 10-20x faster processing compared to CPU-only execution.
|
| 239 |
|
| 240 |
### Prerequisites
|
| 241 |
|
app.py
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
#!/usr/bin/env python3
|
| 2 |
"""
|
| 3 |
-
HuggingFace Spaces entry point for Voice
|
| 4 |
|
| 5 |
This file serves as the main entry point when deploying to HuggingFace Spaces
|
| 6 |
with ZeroGPU support.
|
|
@@ -36,7 +36,7 @@ logger = logging.getLogger(__name__)
|
|
| 36 |
# Log environment information
|
| 37 |
from src.config.gpu_config import GPUConfig
|
| 38 |
|
| 39 |
-
logger.info("Voice
|
| 40 |
logger.info(f"Environment: {GPUConfig.get_environment_type()}")
|
| 41 |
logger.info(f"GPU Available: {GPUConfig.GPU_AVAILABLE}")
|
| 42 |
logger.info(f"ZeroGPU Mode: {GPUConfig.IS_ZEROGPU}")
|
|
|
|
| 1 |
#!/usr/bin/env python3
|
| 2 |
"""
|
| 3 |
+
HuggingFace Spaces entry point for Voice Tools.
|
| 4 |
|
| 5 |
This file serves as the main entry point when deploying to HuggingFace Spaces
|
| 6 |
with ZeroGPU support.
|
|
|
|
| 36 |
# Log environment information
|
| 37 |
from src.config.gpu_config import GPUConfig
|
| 38 |
|
| 39 |
+
logger.info("Voice Tools starting on HuggingFace Spaces")
|
| 40 |
logger.info(f"Environment: {GPUConfig.get_environment_type()}")
|
| 41 |
logger.info(f"GPU Available: {GPUConfig.GPU_AVAILABLE}")
|
| 42 |
logger.info(f"ZeroGPU Mode: {GPUConfig.IS_ZEROGPU}")
|
pyproject.toml
CHANGED
|
@@ -6,7 +6,7 @@ readme = "README.md"
|
|
| 6 |
requires-python = ">=3.10"
|
| 7 |
license = {text = "MIT"}
|
| 8 |
authors = [
|
| 9 |
-
{name = "Voice
|
| 10 |
]
|
| 11 |
keywords = ["audio", "voice-extraction", "speaker-diarization", "ml", "huggingface"]
|
| 12 |
classifiers = [
|
|
|
|
| 6 |
requires-python = ">=3.10"
|
| 7 |
license = {text = "MIT"}
|
| 8 |
authors = [
|
| 9 |
+
{name = "Voice Tools Contributors"}
|
| 10 |
]
|
| 11 |
keywords = ["audio", "voice-extraction", "speaker-diarization", "ml", "huggingface"]
|
| 12 |
classifiers = [
|
scripts/benchmark.py
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
#!/usr/bin/env python3
|
| 2 |
"""
|
| 3 |
-
Performance benchmarking script for Voice
|
| 4 |
|
| 5 |
Validates all success criteria (SC-001 through SC-008) from the specification.
|
| 6 |
"""
|
|
@@ -324,7 +324,7 @@ def benchmark_quality_preservation(results: BenchmarkResults):
|
|
| 324 |
|
| 325 |
|
| 326 |
def main():
|
| 327 |
-
parser = argparse.ArgumentParser(description="Benchmark Voice
|
| 328 |
parser.add_argument(
|
| 329 |
"--audio-dir",
|
| 330 |
type=Path,
|
|
@@ -343,7 +343,7 @@ def main():
|
|
| 343 |
results = BenchmarkResults()
|
| 344 |
|
| 345 |
print("\n" + "=" * 80)
|
| 346 |
-
print("VOICE
|
| 347 |
print("=" * 80 + "\n")
|
| 348 |
|
| 349 |
# Find test files
|
|
|
|
| 1 |
#!/usr/bin/env python3
|
| 2 |
"""
|
| 3 |
+
Performance benchmarking script for Voice Tools.
|
| 4 |
|
| 5 |
Validates all success criteria (SC-001 through SC-008) from the specification.
|
| 6 |
"""
|
|
|
|
| 324 |
|
| 325 |
|
| 326 |
def main():
|
| 327 |
+
parser = argparse.ArgumentParser(description="Benchmark Voice Tools performance")
|
| 328 |
parser.add_argument(
|
| 329 |
"--audio-dir",
|
| 330 |
type=Path,
|
|
|
|
| 343 |
results = BenchmarkResults()
|
| 344 |
|
| 345 |
print("\n" + "=" * 80)
|
| 346 |
+
print("VOICE TOOLS PERFORMANCE BENCHMARK")
|
| 347 |
print("=" * 80 + "\n")
|
| 348 |
|
| 349 |
# Find test files
|
src/cli/__init__.py
CHANGED
|
@@ -1 +1 @@
|
|
| 1 |
-
"""CLI package for Voice
|
|
|
|
| 1 |
+
"""CLI package for Voice Tools."""
|
src/cli/denoise.py
CHANGED
|
@@ -102,7 +102,7 @@ def denoise(
|
|
| 102 |
# Keep more audio (less aggressive)
|
| 103 |
voice-tools denoise noisy_audio.m4a --vad-threshold 0.3 --silence-threshold 3.0
|
| 104 |
"""
|
| 105 |
-
console.print("\n[bold cyan]Voice
|
| 106 |
|
| 107 |
# Validate input file
|
| 108 |
if not input_file.exists():
|
|
|
|
| 102 |
# Keep more audio (less aggressive)
|
| 103 |
voice-tools denoise noisy_audio.m4a --vad-threshold 0.3 --silence-threshold 3.0
|
| 104 |
"""
|
| 105 |
+
console.print("\n[bold cyan]Voice Tools - Voice Denoising[/bold cyan]\n")
|
| 106 |
|
| 107 |
# Validate input file
|
| 108 |
if not input_file.exists():
|
src/cli/extract_speaker.py
CHANGED
|
@@ -130,7 +130,7 @@ def extract_speaker(
|
|
| 130 |
--no-concatenate --output ./alice_segments/
|
| 131 |
"""
|
| 132 |
console.print()
|
| 133 |
-
console.print("[bold]Voice
|
| 134 |
console.print()
|
| 135 |
|
| 136 |
try:
|
|
|
|
| 130 |
--no-concatenate --output ./alice_segments/
|
| 131 |
"""
|
| 132 |
console.print()
|
| 133 |
+
console.print("[bold]Voice Tools - Speaker Extraction[/bold]")
|
| 134 |
console.print()
|
| 135 |
|
| 136 |
try:
|
src/cli/main.py
CHANGED
|
@@ -1,5 +1,5 @@
|
|
| 1 |
"""
|
| 2 |
-
Main CLI entry point for Voice
|
| 3 |
|
| 4 |
Provides command-line interface for voice extraction and profiling tasks.
|
| 5 |
"""
|
|
@@ -41,7 +41,7 @@ logger = logging.getLogger(__name__)
|
|
| 41 |
@click.version_option(version="0.1.0", prog_name="voice-tools")
|
| 42 |
def cli():
|
| 43 |
"""
|
| 44 |
-
Voice
|
| 45 |
|
| 46 |
This tool helps you extract specific voices from audio files using
|
| 47 |
speaker diarization and voice matching. It can separate speech from
|
|
@@ -161,7 +161,7 @@ def extract(
|
|
| 161 |
if verbose:
|
| 162 |
logging.getLogger().setLevel(logging.DEBUG)
|
| 163 |
|
| 164 |
-
display_header("Voice
|
| 165 |
|
| 166 |
# Validate reference file
|
| 167 |
if not reference_file.exists():
|
|
@@ -292,7 +292,7 @@ def scan(audio_file: Path, vad_threshold: float):
|
|
| 292 |
\b
|
| 293 |
voice-tools scan input.m4a
|
| 294 |
"""
|
| 295 |
-
display_header("Voice
|
| 296 |
|
| 297 |
processor = BatchProcessor(vad_threshold=vad_threshold)
|
| 298 |
|
|
@@ -361,7 +361,7 @@ def web(host: str, port: int, share: bool):
|
|
| 361 |
"""
|
| 362 |
from ..web.app import launch
|
| 363 |
|
| 364 |
-
display_header("Voice
|
| 365 |
display_info(f"Starting web server on http://{host}:{port}")
|
| 366 |
|
| 367 |
if share:
|
|
@@ -381,7 +381,7 @@ def web(host: str, port: int, share: bool):
|
|
| 381 |
@cli.command()
|
| 382 |
def info():
|
| 383 |
"""
|
| 384 |
-
Display information about Voice
|
| 385 |
|
| 386 |
Shows configuration, model information, and system details.
|
| 387 |
"""
|
|
@@ -391,7 +391,7 @@ def info():
|
|
| 391 |
from ..services.model_manager import ModelManager
|
| 392 |
from .progress import console
|
| 393 |
|
| 394 |
-
display_header("Voice
|
| 395 |
|
| 396 |
# Version info
|
| 397 |
info_table = Table(title="Version", show_header=False)
|
|
|
|
| 1 |
"""
|
| 2 |
+
Main CLI entry point for Voice Tools.
|
| 3 |
|
| 4 |
Provides command-line interface for voice extraction and profiling tasks.
|
| 5 |
"""
|
|
|
|
| 41 |
@click.version_option(version="0.1.0", prog_name="voice-tools")
|
| 42 |
def cli():
|
| 43 |
"""
|
| 44 |
+
Voice Tools - Extract and profile voices from audio files.
|
| 45 |
|
| 46 |
This tool helps you extract specific voices from audio files using
|
| 47 |
speaker diarization and voice matching. It can separate speech from
|
|
|
|
| 161 |
if verbose:
|
| 162 |
logging.getLogger().setLevel(logging.DEBUG)
|
| 163 |
|
| 164 |
+
display_header("Voice Tools - Extract Voice Segments")
|
| 165 |
|
| 166 |
# Validate reference file
|
| 167 |
if not reference_file.exists():
|
|
|
|
| 292 |
\b
|
| 293 |
voice-tools scan input.m4a
|
| 294 |
"""
|
| 295 |
+
display_header("Voice Tools - Voice Activity Scan")
|
| 296 |
|
| 297 |
processor = BatchProcessor(vad_threshold=vad_threshold)
|
| 298 |
|
|
|
|
| 361 |
"""
|
| 362 |
from ..web.app import launch
|
| 363 |
|
| 364 |
+
display_header("Voice Tools - Web Interface")
|
| 365 |
display_info(f"Starting web server on http://{host}:{port}")
|
| 366 |
|
| 367 |
if share:
|
|
|
|
| 381 |
@cli.command()
|
| 382 |
def info():
|
| 383 |
"""
|
| 384 |
+
Display information about Voice Tools.
|
| 385 |
|
| 386 |
Shows configuration, model information, and system details.
|
| 387 |
"""
|
|
|
|
| 391 |
from ..services.model_manager import ModelManager
|
| 392 |
from .progress import console
|
| 393 |
|
| 394 |
+
display_header("Voice Tools - System Information")
|
| 395 |
|
| 396 |
# Version info
|
| 397 |
info_table = Table(title="Version", show_header=False)
|
src/cli/separate.py
CHANGED
|
@@ -126,7 +126,7 @@ def separate(
|
|
| 126 |
|
| 127 |
# Display header
|
| 128 |
if not quiet:
|
| 129 |
-
console.print("\n[bold cyan]Voice
|
| 130 |
|
| 131 |
# Create output directory
|
| 132 |
output_dir.mkdir(parents=True, exist_ok=True)
|
|
|
|
| 126 |
|
| 127 |
# Display header
|
| 128 |
if not quiet:
|
| 129 |
+
console.print("\n[bold cyan]Voice Tools - Speaker Separation[/bold cyan]\n")
|
| 130 |
|
| 131 |
# Create output directory
|
| 132 |
output_dir.mkdir(parents=True, exist_ok=True)
|
src/web/__init__.py
CHANGED
|
@@ -1 +1 @@
|
|
| 1 |
-
"""Web interface package for Voice
|
|
|
|
| 1 |
+
"""Web interface package for Voice Tools."""
|
src/web/app.py
CHANGED
|
@@ -1,5 +1,5 @@
|
|
| 1 |
"""
|
| 2 |
-
Gradio web interface for Voice
|
| 3 |
|
| 4 |
Provides a user-friendly web UI for uploading audio files, configuring
|
| 5 |
extraction parameters, and downloading results.
|
|
@@ -49,11 +49,11 @@ def create_app() -> gr.Blocks:
|
|
| 49 |
Configured Gradio Blocks app
|
| 50 |
"""
|
| 51 |
|
| 52 |
-
with gr.Blocks(title="Voice
|
| 53 |
# Header
|
| 54 |
gr.Markdown(
|
| 55 |
"""
|
| 56 |
-
# 🎤 Voice
|
| 57 |
|
| 58 |
Extract and profile specific voices from audio files using AI-powered
|
| 59 |
speaker diarization and voice matching.
|
|
@@ -238,7 +238,7 @@ def create_app() -> gr.Blocks:
|
|
| 238 |
"""
|
| 239 |
---
|
| 240 |
<div class="footer">
|
| 241 |
-
Voice
|
| 242 |
</div>
|
| 243 |
""",
|
| 244 |
elem_classes=["footer"],
|
|
@@ -266,7 +266,7 @@ def launch(
|
|
| 266 |
|
| 267 |
app = create_app()
|
| 268 |
|
| 269 |
-
logger.info(f"Launching Voice
|
| 270 |
|
| 271 |
app.launch(
|
| 272 |
server_name=server_name,
|
|
|
|
| 1 |
"""
|
| 2 |
+
Gradio web interface for Voice Tools.
|
| 3 |
|
| 4 |
Provides a user-friendly web UI for uploading audio files, configuring
|
| 5 |
extraction parameters, and downloading results.
|
|
|
|
| 49 |
Configured Gradio Blocks app
|
| 50 |
"""
|
| 51 |
|
| 52 |
+
with gr.Blocks(title="Voice Tools") as app:
|
| 53 |
# Header
|
| 54 |
gr.Markdown(
|
| 55 |
"""
|
| 56 |
+
# 🎤 Voice Tools
|
| 57 |
|
| 58 |
Extract and profile specific voices from audio files using AI-powered
|
| 59 |
speaker diarization and voice matching.
|
|
|
|
| 238 |
"""
|
| 239 |
---
|
| 240 |
<div class="footer">
|
| 241 |
+
Voice Tools v0.1.0 | Powered by Gradio, PyAnnote, and Transformers
|
| 242 |
</div>
|
| 243 |
""",
|
| 244 |
elem_classes=["footer"],
|
|
|
|
| 266 |
|
| 267 |
app = create_app()
|
| 268 |
|
| 269 |
+
logger.info(f"Launching Voice Tools web interface on {server_name}:{server_port}")
|
| 270 |
|
| 271 |
app.launch(
|
| 272 |
server_name=server_name,
|