# 🎵 Genie TTS API 完整使用指南 ## 🎯 概述 Genie TTS API 是基于 **GPT-SoVITS V2** 架构的日语文本转语音服务,现在已经完全模块化,提供以下核心功能: - **🎤 基础TTS合成**: 使用预训练角色进行语音合成 - **🎭 语音克隆**: 基于参考音频快速创建个性化声音 - **📊 音频分析**: 智能分析音频质量和特征 - **🔄 批量处理**: 高效的批量文本转语音 ## 🏗️ 模块化架构 ``` genie/GENIE/ ├── api.py # 主API应用 ├── api_routes.py # 路由定义 ├── voice_cloning.py # 语音克隆核心功能 ├── models.py # 数据模型定义 ├── tts_engine.py # TTS引擎 ├── config.py # 配置文件 └── requirements.txt # 依赖包 ``` ## 🚀 快速开始 ### 基础信息 - **API基础URL**: `http://localhost:8000` (本地开发) - **API文档**: `http://localhost:8000/docs` (Swagger UI) - **ReDoc文档**: `http://localhost:8000/redoc` ### 启动服务 ```bash # 方法1: 直接运行API服务 python api.py # 方法2: 与Gradio集成运行 python app.py # 如果app.py中集成了API ``` ## 📚 API端点详细说明 ### 1. 基础功能 #### 根信息 ```http GET / ``` **响应示例**: ```json { "name": "Genie TTS API", "version": "1.0.0", "description": "高质量日语文本转语音API服务", "engine": "GPT-SoVITS V2 (ONNX)", "supported_languages": ["ja"], "features": ["TTS合成", "语音克隆", "角色管理"], "docs": "/docs" } ``` #### 健康检查 ```http GET /health ``` **响应示例**: ```json { "status": "healthy", "version": "1.0.0", "engine_status": "ready", "available_characters": ["misono_mika"], "predefined_characters": 1, "custom_characters": 0, "cloned_voices": 2, "timestamp": "2024-01-01T12:00:00" } ``` ### 2. 文本转语音 (TTS) #### 基础合成 (POST) ```http POST /tts/synthesize Content-Type: application/json { "text": "こんにちは、世界!", "character": "misono_mika", "speed": 1.0, "format": "wav" } ``` #### 快速合成 (GET) ```http GET /tts/synthesize?text=こんにちは&character=misono_mika ``` #### 批量合成 ```http POST /tts/batch-synthesize Content-Type: application/json [ {"text": "おはようございます", "character": "misono_mika"}, {"text": "こんばんは", "character": "misono_mika"} ] ``` ### 3. 语音克隆 #### 分析参考音频 ```http POST /voice-clone/analyze-audio?audio_url=/path/to/audio.wav ``` **响应示例**: ```json { "success": true, "analysis": { "duration": 5.2, "sample_rate": 22050, "quality_score": 0.85, "recommendations": [] }, "timestamp": "2024-01-01T12:00:00" } ``` #### 创建克隆声音 ```http POST /voice-clone/create Content-Type: application/json { "voice_name": "my_voice", "reference_audio_url": "/path/to/reference.wav", "reference_text": "参考音频中的日语文本内容", "description": "我的个人声音克隆" } ``` #### 获取克隆声音列表 ```http GET /voice-clone/list ``` **响应示例**: ```json { "success": true, "message": "获取到 2 个克隆声音", "cloned_voices": { "my_voice": { "name": "my_voice", "description": "我的个人声音克隆", "reference_text": "参考音频中的日语文本...", "quality_score": 0.85, "duration": 5.2, "created_time": "2024-01-01T12:00:00" } }, "count": 1, "timestamp": "2024-01-01T12:00:00" } ``` #### 使用克隆声音合成 ```http POST /voice-clone/synthesize?voice_name=my_voice&text=こんにちは、今日はいい天気ですね ``` #### 删除克隆声音 ```http DELETE /voice-clone/my_voice ``` ### 4. 角色管理 #### 获取所有角色 ```http GET /characters/ ``` ### 5. 音频文件下载 #### 获取生成的音频 ```http GET /audio/{filename} ``` ## 💻 编程示例 ### Python 完整示例 ```python import requests import json from pathlib import Path class GenieTTSClient: def __init__(self, base_url="http://localhost:8000"): self.base_url = base_url.rstrip('/') def health_check(self): """检查API健康状态""" response = requests.get(f"{self.base_url}/health") return response.json() def basic_tts(self, text, character="misono_mika"): """基础文本转语音""" data = { "text": text, "character": character, "speed": 1.0, "format": "wav" } response = requests.post(f"{self.base_url}/tts/synthesize", json=data) result = response.json() if result["success"]: # 下载音频文件 audio_url = result["audio_url"] audio_response = requests.get(f"{self.base_url}{audio_url}") filename = f"tts_output_{result['timestamp'][:19].replace(':', '-')}.wav" with open(filename, "wb") as f: f.write(audio_response.content) print(f"✅ TTS合成成功,保存为: {filename}") return filename else: print(f"❌ TTS合成失败: {result.get('error', '未知错误')}") return None def analyze_audio(self, audio_path): """分析音频质量""" params = {"audio_url": str(Path(audio_path).absolute())} response = requests.post(f"{self.base_url}/voice-clone/analyze-audio", params=params) return response.json() def create_cloned_voice(self, voice_name, audio_path, reference_text, description=""): """创建克隆声音""" data = { "voice_name": voice_name, "reference_audio_url": str(Path(audio_path).absolute()), "reference_text": reference_text, "description": description } response = requests.post(f"{self.base_url}/voice-clone/create", json=data) result = response.json() if result["success"]: print(f"✅ 克隆声音创建成功: {voice_name}") else: print(f"❌ 克隆声音创建失败: {result.get('error', '未知错误')}") return result def clone_voice_tts(self, voice_name, text): """使用克隆声音进行合成""" params = {"voice_name": voice_name, "text": text} response = requests.post(f"{self.base_url}/voice-clone/synthesize", params=params) result = response.json() if result["success"]: # 下载音频文件 audio_url = result["audio_url"] audio_response = requests.get(f"{self.base_url}{audio_url}") filename = f"cloned_voice_{voice_name}_{result['timestamp'][:19].replace(':', '-')}.wav" with open(filename, "wb") as f: f.write(audio_response.content) print(f"✅ 克隆声音合成成功,保存为: {filename}") return filename else: print(f"❌ 克隆声音合成失败: {result.get('error', '未知错误')}") return None def list_cloned_voices(self): """获取所有克隆声音""" response = requests.get(f"{self.base_url}/voice-clone/list") return response.json() # 使用示例 def main(): client = GenieTTSClient() # 1. 健康检查 print("1. 检查API状态...") health = client.health_check() print(f" 状态: {health['status']}") print(f" 引擎: {health['engine_status']}") print(f" 可用角色: {health['available_characters']}") # 2. 基础TTS合成 print("\n2. 基础TTS合成...") tts_file = client.basic_tts("こんにちは、今日はいい天気ですね。") # 3. 语音克隆(需要准备参考音频) reference_audio = "reference_voice.wav" # 替换为实际音频文件路径 reference_text = "参考音频中的日语文本" # 替换为实际文本 if Path(reference_audio).exists(): print(f"\n3. 分析参考音频: {reference_audio}") analysis = client.analyze_audio(reference_audio) print(f" 音频质量分数: {analysis.get('analysis', {}).get('quality_score', 'N/A')}") print("\n4. 创建克隆声音...") clone_result = client.create_cloned_voice( voice_name="test_voice", audio_path=reference_audio, reference_text=reference_text, description="测试用克隆声音" ) if clone_result.get("success"): print("\n5. 使用克隆声音合成...") cloned_file = client.clone_voice_tts( "test_voice", "これはクローン音声のテストです。" ) # 6. 查看所有克隆声音 print("\n6. 查看所有克隆声音...") voices = client.list_cloned_voices() if voices.get("success"): print(f" 共有 {voices['count']} 个克隆声音:") for name, info in voices.get("cloned_voices", {}).items(): print(f" - {name}: {info['description']}") if __name__ == "__main__": main() ``` ### JavaScript/Node.js 示例 ```javascript const axios = require('axios'); class GenieTTSClient { constructor(baseUrl = 'http://localhost:8000') { this.baseUrl = baseUrl.replace(/\/+$/, ''); } async healthCheck() { const response = await axios.get(`${this.baseUrl}/health`); return response.data; } async basicTTS(text, character = 'misono_mika') { try { const response = await axios.post(`${this.baseUrl}/tts/synthesize`, { text: text, character: character, speed: 1.0, format: 'wav' }); if (response.data.success) { console.log('✅ TTS合成成功:', response.data.audio_url); return response.data; } else { console.error('❌ TTS合成失败:', response.data.error); } } catch (error) { console.error('请求失败:', error.message); } } async createClonedVoice(voiceName, audioPath, referenceText, description = '') { try { const response = await axios.post(`${this.baseUrl}/voice-clone/create`, { voice_name: voiceName, reference_audio_url: audioPath, reference_text: referenceText, description: description }); if (response.data.success) { console.log('✅ 克隆声音创建成功:', voiceName); } else { console.error('❌ 克隆声音创建失败:', response.data.error); } return response.data; } catch (error) { console.error('请求失败:', error.message); } } async synthesizeWithClonedVoice(voiceName, text) { try { const response = await axios.post(`${this.baseUrl}/voice-clone/synthesize`, null, { params: { voice_name: voiceName, text: text } }); if (response.data.success) { console.log('✅ 克隆声音合成成功:', response.data.audio_url); return response.data; } else { console.error('❌ 克隆声音合成失败:', response.data.error); } } catch (error) { console.error('请求失败:', error.message); } } } // 使用示例 async function main() { const client = new GenieTTSClient(); // 1. 健康检查 console.log('1. 检查API状态...'); const health = await client.healthCheck(); console.log(' 状态:', health.status); // 2. 基础TTS console.log('\n2. 基础TTS合成...'); await client.basicTTS('こんにちは、世界!'); // 3. 语音克隆 console.log('\n3. 创建克隆声音...'); await client.createClonedVoice( 'test_voice', '/path/to/reference.wav', '参考音频中的文本', '测试克隆声音' ); // 4. 使用克隆声音 console.log('\n4. 使用克隆声音合成...'); await client.synthesizeWithClonedVoice('test_voice', 'テストです。'); } main().catch(console.error); ``` ## ⚠️ 重要注意事项 ### 语音克隆要求 1. **音频质量**: 清晰、无噪音、22kHz以上采样率 2. **音频长度**: 3-30秒最佳 3. **语言**: 仅支持日语 4. **文本匹配**: 参考文本必须与音频内容完全匹配 ### 性能考量 - **首次使用**: 需要下载模型(约30秒) - **语音合成**: 每次5-10秒 - **克隆声音**: 创建时间2-5秒 - **并发处理**: 建议控制在5个以内 ### 存储说明 - 音频文件临时存储在系统临时目录 - 克隆声音配置持久化保存 - 定期清理过期的音频文件 ## 🔧 开发部署 ### 本地开发 ```bash pip install -r requirements.txt python api.py ``` ### 生产部署 ```bash # 使用uvicorn uvicorn api:api_app --host 0.0.0.0 --port 8000 --workers 1 # 使用gunicorn pip install gunicorn[uvicorn] gunicorn api:api_app -w 1 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000 ``` ### Docker 部署 ```dockerfile FROM python:3.10 WORKDIR /app COPY requirements.txt . RUN pip install -r requirements.txt COPY . . EXPOSE 8000 CMD ["python", "api.py"] ``` ## 🐛 故障排除 ### 常见问题 1. **引擎初始化失败** - 检查onnxruntime是否正确安装 - 确认网络连接正常(需要下载模型) 2. **音频分析失败** - 检查音频文件格式(支持wav, flac, ogg等) - 确认文件路径正确且有读取权限 3. **语音克隆效果不好** - 提高参考音频质量 - 确保参考文本准确匹配 - 尝试不同长度的参考音频 ### 调试模式 ```python import logging logging.basicConfig(level=logging.DEBUG) ``` ## 📞 技术支持 - **官方仓库**: [Genie TTS](https://github.com/High-Logic/Genie) - **问题反馈**: 提交GitHub Issue - **讨论区**: GitHub Discussions --- **版本**: 1.0.0 | **最后更新**: 2024年1月