Spaces:
Sleeping
Sleeping
| # 🎵 Genie TTS API 完整使用指南 | |
| ## 🎯 概述 | |
| Genie TTS API 是基于 **GPT-SoVITS V2** 架构的日语文本转语音服务,现在已经完全模块化,提供以下核心功能: | |
| - **🎤 基础TTS合成**: 使用预训练角色进行语音合成 | |
| - **🎭 语音克隆**: 基于参考音频快速创建个性化声音 | |
| - **📊 音频分析**: 智能分析音频质量和特征 | |
| - **🔄 批量处理**: 高效的批量文本转语音 | |
| ## 🏗️ 模块化架构 | |
| ``` | |
| genie/GENIE/ | |
| ├── api.py # 主API应用 | |
| ├── api_routes.py # 路由定义 | |
| ├── voice_cloning.py # 语音克隆核心功能 | |
| ├── models.py # 数据模型定义 | |
| ├── tts_engine.py # TTS引擎 | |
| ├── config.py # 配置文件 | |
| └── requirements.txt # 依赖包 | |
| ``` | |
| ## 🚀 快速开始 | |
| ### 基础信息 | |
| - **API基础URL**: `http://localhost:8000` (本地开发) | |
| - **API文档**: `http://localhost:8000/docs` (Swagger UI) | |
| - **ReDoc文档**: `http://localhost:8000/redoc` | |
| ### 启动服务 | |
| ```bash | |
| # 方法1: 直接运行API服务 | |
| python api.py | |
| # 方法2: 与Gradio集成运行 | |
| python app.py # 如果app.py中集成了API | |
| ``` | |
| ## 📚 API端点详细说明 | |
| ### 1. 基础功能 | |
| #### 根信息 | |
| ```http | |
| GET / | |
| ``` | |
| **响应示例**: | |
| ```json | |
| { | |
| "name": "Genie TTS API", | |
| "version": "1.0.0", | |
| "description": "高质量日语文本转语音API服务", | |
| "engine": "GPT-SoVITS V2 (ONNX)", | |
| "supported_languages": ["ja"], | |
| "features": ["TTS合成", "语音克隆", "角色管理"], | |
| "docs": "/docs" | |
| } | |
| ``` | |
| #### 健康检查 | |
| ```http | |
| GET /health | |
| ``` | |
| **响应示例**: | |
| ```json | |
| { | |
| "status": "healthy", | |
| "version": "1.0.0", | |
| "engine_status": "ready", | |
| "available_characters": ["misono_mika"], | |
| "predefined_characters": 1, | |
| "custom_characters": 0, | |
| "cloned_voices": 2, | |
| "timestamp": "2024-01-01T12:00:00" | |
| } | |
| ``` | |
| ### 2. 文本转语音 (TTS) | |
| #### 基础合成 (POST) | |
| ```http | |
| POST /tts/synthesize | |
| Content-Type: application/json | |
| { | |
| "text": "こんにちは、世界!", | |
| "character": "misono_mika", | |
| "speed": 1.0, | |
| "format": "wav" | |
| } | |
| ``` | |
| #### 快速合成 (GET) | |
| ```http | |
| GET /tts/synthesize?text=こんにちは&character=misono_mika | |
| ``` | |
| #### 批量合成 | |
| ```http | |
| POST /tts/batch-synthesize | |
| Content-Type: application/json | |
| [ | |
| {"text": "おはようございます", "character": "misono_mika"}, | |
| {"text": "こんばんは", "character": "misono_mika"} | |
| ] | |
| ``` | |
| ### 3. 语音克隆 | |
| #### 分析参考音频 | |
| ```http | |
| POST /voice-clone/analyze-audio?audio_url=/path/to/audio.wav | |
| ``` | |
| **响应示例**: | |
| ```json | |
| { | |
| "success": true, | |
| "analysis": { | |
| "duration": 5.2, | |
| "sample_rate": 22050, | |
| "quality_score": 0.85, | |
| "recommendations": [] | |
| }, | |
| "timestamp": "2024-01-01T12:00:00" | |
| } | |
| ``` | |
| #### 创建克隆声音 | |
| ```http | |
| POST /voice-clone/create | |
| Content-Type: application/json | |
| { | |
| "voice_name": "my_voice", | |
| "reference_audio_url": "/path/to/reference.wav", | |
| "reference_text": "参考音频中的日语文本内容", | |
| "description": "我的个人声音克隆" | |
| } | |
| ``` | |
| #### 获取克隆声音列表 | |
| ```http | |
| GET /voice-clone/list | |
| ``` | |
| **响应示例**: | |
| ```json | |
| { | |
| "success": true, | |
| "message": "获取到 2 个克隆声音", | |
| "cloned_voices": { | |
| "my_voice": { | |
| "name": "my_voice", | |
| "description": "我的个人声音克隆", | |
| "reference_text": "参考音频中的日语文本...", | |
| "quality_score": 0.85, | |
| "duration": 5.2, | |
| "created_time": "2024-01-01T12:00:00" | |
| } | |
| }, | |
| "count": 1, | |
| "timestamp": "2024-01-01T12:00:00" | |
| } | |
| ``` | |
| #### 使用克隆声音合成 | |
| ```http | |
| POST /voice-clone/synthesize?voice_name=my_voice&text=こんにちは、今日はいい天気ですね | |
| ``` | |
| #### 删除克隆声音 | |
| ```http | |
| DELETE /voice-clone/my_voice | |
| ``` | |
| ### 4. 角色管理 | |
| #### 获取所有角色 | |
| ```http | |
| GET /characters/ | |
| ``` | |
| ### 5. 音频文件下载 | |
| #### 获取生成的音频 | |
| ```http | |
| GET /audio/{filename} | |
| ``` | |
| ## 💻 编程示例 | |
| ### Python 完整示例 | |
| ```python | |
| import requests | |
| import json | |
| from pathlib import Path | |
| class GenieTTSClient: | |
| def __init__(self, base_url="http://localhost:8000"): | |
| self.base_url = base_url.rstrip('/') | |
| def health_check(self): | |
| """检查API健康状态""" | |
| response = requests.get(f"{self.base_url}/health") | |
| return response.json() | |
| def basic_tts(self, text, character="misono_mika"): | |
| """基础文本转语音""" | |
| data = { | |
| "text": text, | |
| "character": character, | |
| "speed": 1.0, | |
| "format": "wav" | |
| } | |
| response = requests.post(f"{self.base_url}/tts/synthesize", json=data) | |
| result = response.json() | |
| if result["success"]: | |
| # 下载音频文件 | |
| audio_url = result["audio_url"] | |
| audio_response = requests.get(f"{self.base_url}{audio_url}") | |
| filename = f"tts_output_{result['timestamp'][:19].replace(':', '-')}.wav" | |
| with open(filename, "wb") as f: | |
| f.write(audio_response.content) | |
| print(f"✅ TTS合成成功,保存为: {filename}") | |
| return filename | |
| else: | |
| print(f"❌ TTS合成失败: {result.get('error', '未知错误')}") | |
| return None | |
| def analyze_audio(self, audio_path): | |
| """分析音频质量""" | |
| params = {"audio_url": str(Path(audio_path).absolute())} | |
| response = requests.post(f"{self.base_url}/voice-clone/analyze-audio", params=params) | |
| return response.json() | |
| def create_cloned_voice(self, voice_name, audio_path, reference_text, description=""): | |
| """创建克隆声音""" | |
| data = { | |
| "voice_name": voice_name, | |
| "reference_audio_url": str(Path(audio_path).absolute()), | |
| "reference_text": reference_text, | |
| "description": description | |
| } | |
| response = requests.post(f"{self.base_url}/voice-clone/create", json=data) | |
| result = response.json() | |
| if result["success"]: | |
| print(f"✅ 克隆声音创建成功: {voice_name}") | |
| else: | |
| print(f"❌ 克隆声音创建失败: {result.get('error', '未知错误')}") | |
| return result | |
| def clone_voice_tts(self, voice_name, text): | |
| """使用克隆声音进行合成""" | |
| params = {"voice_name": voice_name, "text": text} | |
| response = requests.post(f"{self.base_url}/voice-clone/synthesize", params=params) | |
| result = response.json() | |
| if result["success"]: | |
| # 下载音频文件 | |
| audio_url = result["audio_url"] | |
| audio_response = requests.get(f"{self.base_url}{audio_url}") | |
| filename = f"cloned_voice_{voice_name}_{result['timestamp'][:19].replace(':', '-')}.wav" | |
| with open(filename, "wb") as f: | |
| f.write(audio_response.content) | |
| print(f"✅ 克隆声音合成成功,保存为: {filename}") | |
| return filename | |
| else: | |
| print(f"❌ 克隆声音合成失败: {result.get('error', '未知错误')}") | |
| return None | |
| def list_cloned_voices(self): | |
| """获取所有克隆声音""" | |
| response = requests.get(f"{self.base_url}/voice-clone/list") | |
| return response.json() | |
| # 使用示例 | |
| def main(): | |
| client = GenieTTSClient() | |
| # 1. 健康检查 | |
| print("1. 检查API状态...") | |
| health = client.health_check() | |
| print(f" 状态: {health['status']}") | |
| print(f" 引擎: {health['engine_status']}") | |
| print(f" 可用角色: {health['available_characters']}") | |
| # 2. 基础TTS合成 | |
| print("\n2. 基础TTS合成...") | |
| tts_file = client.basic_tts("こんにちは、今日はいい天気ですね。") | |
| # 3. 语音克隆(需要准备参考音频) | |
| reference_audio = "reference_voice.wav" # 替换为实际音频文件路径 | |
| reference_text = "参考音频中的日语文本" # 替换为实际文本 | |
| if Path(reference_audio).exists(): | |
| print(f"\n3. 分析参考音频: {reference_audio}") | |
| analysis = client.analyze_audio(reference_audio) | |
| print(f" 音频质量分数: {analysis.get('analysis', {}).get('quality_score', 'N/A')}") | |
| print("\n4. 创建克隆声音...") | |
| clone_result = client.create_cloned_voice( | |
| voice_name="test_voice", | |
| audio_path=reference_audio, | |
| reference_text=reference_text, | |
| description="测试用克隆声音" | |
| ) | |
| if clone_result.get("success"): | |
| print("\n5. 使用克隆声音合成...") | |
| cloned_file = client.clone_voice_tts( | |
| "test_voice", | |
| "これはクローン音声のテストです。" | |
| ) | |
| # 6. 查看所有克隆声音 | |
| print("\n6. 查看所有克隆声音...") | |
| voices = client.list_cloned_voices() | |
| if voices.get("success"): | |
| print(f" 共有 {voices['count']} 个克隆声音:") | |
| for name, info in voices.get("cloned_voices", {}).items(): | |
| print(f" - {name}: {info['description']}") | |
| if __name__ == "__main__": | |
| main() | |
| ``` | |
| ### JavaScript/Node.js 示例 | |
| ```javascript | |
| const axios = require('axios'); | |
| class GenieTTSClient { | |
| constructor(baseUrl = 'http://localhost:8000') { | |
| this.baseUrl = baseUrl.replace(/\/+$/, ''); | |
| } | |
| async healthCheck() { | |
| const response = await axios.get(`${this.baseUrl}/health`); | |
| return response.data; | |
| } | |
| async basicTTS(text, character = 'misono_mika') { | |
| try { | |
| const response = await axios.post(`${this.baseUrl}/tts/synthesize`, { | |
| text: text, | |
| character: character, | |
| speed: 1.0, | |
| format: 'wav' | |
| }); | |
| if (response.data.success) { | |
| console.log('✅ TTS合成成功:', response.data.audio_url); | |
| return response.data; | |
| } else { | |
| console.error('❌ TTS合成失败:', response.data.error); | |
| } | |
| } catch (error) { | |
| console.error('请求失败:', error.message); | |
| } | |
| } | |
| async createClonedVoice(voiceName, audioPath, referenceText, description = '') { | |
| try { | |
| const response = await axios.post(`${this.baseUrl}/voice-clone/create`, { | |
| voice_name: voiceName, | |
| reference_audio_url: audioPath, | |
| reference_text: referenceText, | |
| description: description | |
| }); | |
| if (response.data.success) { | |
| console.log('✅ 克隆声音创建成功:', voiceName); | |
| } else { | |
| console.error('❌ 克隆声音创建失败:', response.data.error); | |
| } | |
| return response.data; | |
| } catch (error) { | |
| console.error('请求失败:', error.message); | |
| } | |
| } | |
| async synthesizeWithClonedVoice(voiceName, text) { | |
| try { | |
| const response = await axios.post(`${this.baseUrl}/voice-clone/synthesize`, null, { | |
| params: { voice_name: voiceName, text: text } | |
| }); | |
| if (response.data.success) { | |
| console.log('✅ 克隆声音合成成功:', response.data.audio_url); | |
| return response.data; | |
| } else { | |
| console.error('❌ 克隆声音合成失败:', response.data.error); | |
| } | |
| } catch (error) { | |
| console.error('请求失败:', error.message); | |
| } | |
| } | |
| } | |
| // 使用示例 | |
| async function main() { | |
| const client = new GenieTTSClient(); | |
| // 1. 健康检查 | |
| console.log('1. 检查API状态...'); | |
| const health = await client.healthCheck(); | |
| console.log(' 状态:', health.status); | |
| // 2. 基础TTS | |
| console.log('\n2. 基础TTS合成...'); | |
| await client.basicTTS('こんにちは、世界!'); | |
| // 3. 语音克隆 | |
| console.log('\n3. 创建克隆声音...'); | |
| await client.createClonedVoice( | |
| 'test_voice', | |
| '/path/to/reference.wav', | |
| '参考音频中的文本', | |
| '测试克隆声音' | |
| ); | |
| // 4. 使用克隆声音 | |
| console.log('\n4. 使用克隆声音合成...'); | |
| await client.synthesizeWithClonedVoice('test_voice', 'テストです。'); | |
| } | |
| main().catch(console.error); | |
| ``` | |
| ## ⚠️ 重要注意事项 | |
| ### 语音克隆要求 | |
| 1. **音频质量**: 清晰、无噪音、22kHz以上采样率 | |
| 2. **音频长度**: 3-30秒最佳 | |
| 3. **语言**: 仅支持日语 | |
| 4. **文本匹配**: 参考文本必须与音频内容完全匹配 | |
| ### 性能考量 | |
| - **首次使用**: 需要下载模型(约30秒) | |
| - **语音合成**: 每次5-10秒 | |
| - **克隆声音**: 创建时间2-5秒 | |
| - **并发处理**: 建议控制在5个以内 | |
| ### 存储说明 | |
| - 音频文件临时存储在系统临时目录 | |
| - 克隆声音配置持久化保存 | |
| - 定期清理过期的音频文件 | |
| ## 🔧 开发部署 | |
| ### 本地开发 | |
| ```bash | |
| pip install -r requirements.txt | |
| python api.py | |
| ``` | |
| ### 生产部署 | |
| ```bash | |
| # 使用uvicorn | |
| uvicorn api:api_app --host 0.0.0.0 --port 8000 --workers 1 | |
| # 使用gunicorn | |
| pip install gunicorn[uvicorn] | |
| gunicorn api:api_app -w 1 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000 | |
| ``` | |
| ### Docker 部署 | |
| ```dockerfile | |
| FROM python:3.10 | |
| WORKDIR /app | |
| COPY requirements.txt . | |
| RUN pip install -r requirements.txt | |
| COPY . . | |
| EXPOSE 8000 | |
| CMD ["python", "api.py"] | |
| ``` | |
| ## 🐛 故障排除 | |
| ### 常见问题 | |
| 1. **引擎初始化失败** | |
| - 检查onnxruntime是否正确安装 | |
| - 确认网络连接正常(需要下载模型) | |
| 2. **音频分析失败** | |
| - 检查音频文件格式(支持wav, flac, ogg等) | |
| - 确认文件路径正确且有读取权限 | |
| 3. **语音克隆效果不好** | |
| - 提高参考音频质量 | |
| - 确保参考文本准确匹配 | |
| - 尝试不同长度的参考音频 | |
| ### 调试模式 | |
| ```python | |
| import logging | |
| logging.basicConfig(level=logging.DEBUG) | |
| ``` | |
| ## 📞 技术支持 | |
| - **官方仓库**: [Genie TTS](https://github.com/High-Logic/Genie) | |
| - **问题反馈**: 提交GitHub Issue | |
| - **讨论区**: GitHub Discussions | |
| --- | |
| **版本**: 1.0.0 | **最后更新**: 2024年1月 |