GENIE / API_USAGE_GUIDE.md
Tom1986's picture
feat: Implement Genie TTS API with text-to-speech and voice cloning functionalities
9bc318e
# 🎵 Genie TTS API 完整使用指南
## 🎯 概述
Genie TTS API 是基于 **GPT-SoVITS V2** 架构的日语文本转语音服务,现在已经完全模块化,提供以下核心功能:
- **🎤 基础TTS合成**: 使用预训练角色进行语音合成
- **🎭 语音克隆**: 基于参考音频快速创建个性化声音
- **📊 音频分析**: 智能分析音频质量和特征
- **🔄 批量处理**: 高效的批量文本转语音
## 🏗️ 模块化架构
```
genie/GENIE/
├── api.py # 主API应用
├── api_routes.py # 路由定义
├── voice_cloning.py # 语音克隆核心功能
├── models.py # 数据模型定义
├── tts_engine.py # TTS引擎
├── config.py # 配置文件
└── requirements.txt # 依赖包
```
## 🚀 快速开始
### 基础信息
- **API基础URL**: `http://localhost:8000` (本地开发)
- **API文档**: `http://localhost:8000/docs` (Swagger UI)
- **ReDoc文档**: `http://localhost:8000/redoc`
### 启动服务
```bash
# 方法1: 直接运行API服务
python api.py
# 方法2: 与Gradio集成运行
python app.py # 如果app.py中集成了API
```
## 📚 API端点详细说明
### 1. 基础功能
#### 根信息
```http
GET /
```
**响应示例**:
```json
{
"name": "Genie TTS API",
"version": "1.0.0",
"description": "高质量日语文本转语音API服务",
"engine": "GPT-SoVITS V2 (ONNX)",
"supported_languages": ["ja"],
"features": ["TTS合成", "语音克隆", "角色管理"],
"docs": "/docs"
}
```
#### 健康检查
```http
GET /health
```
**响应示例**:
```json
{
"status": "healthy",
"version": "1.0.0",
"engine_status": "ready",
"available_characters": ["misono_mika"],
"predefined_characters": 1,
"custom_characters": 0,
"cloned_voices": 2,
"timestamp": "2024-01-01T12:00:00"
}
```
### 2. 文本转语音 (TTS)
#### 基础合成 (POST)
```http
POST /tts/synthesize
Content-Type: application/json
{
"text": "こんにちは、世界!",
"character": "misono_mika",
"speed": 1.0,
"format": "wav"
}
```
#### 快速合成 (GET)
```http
GET /tts/synthesize?text=こんにちは&character=misono_mika
```
#### 批量合成
```http
POST /tts/batch-synthesize
Content-Type: application/json
[
{"text": "おはようございます", "character": "misono_mika"},
{"text": "こんばんは", "character": "misono_mika"}
]
```
### 3. 语音克隆
#### 分析参考音频
```http
POST /voice-clone/analyze-audio?audio_url=/path/to/audio.wav
```
**响应示例**:
```json
{
"success": true,
"analysis": {
"duration": 5.2,
"sample_rate": 22050,
"quality_score": 0.85,
"recommendations": []
},
"timestamp": "2024-01-01T12:00:00"
}
```
#### 创建克隆声音
```http
POST /voice-clone/create
Content-Type: application/json
{
"voice_name": "my_voice",
"reference_audio_url": "/path/to/reference.wav",
"reference_text": "参考音频中的日语文本内容",
"description": "我的个人声音克隆"
}
```
#### 获取克隆声音列表
```http
GET /voice-clone/list
```
**响应示例**:
```json
{
"success": true,
"message": "获取到 2 个克隆声音",
"cloned_voices": {
"my_voice": {
"name": "my_voice",
"description": "我的个人声音克隆",
"reference_text": "参考音频中的日语文本...",
"quality_score": 0.85,
"duration": 5.2,
"created_time": "2024-01-01T12:00:00"
}
},
"count": 1,
"timestamp": "2024-01-01T12:00:00"
}
```
#### 使用克隆声音合成
```http
POST /voice-clone/synthesize?voice_name=my_voice&text=こんにちは、今日はいい天気ですね
```
#### 删除克隆声音
```http
DELETE /voice-clone/my_voice
```
### 4. 角色管理
#### 获取所有角色
```http
GET /characters/
```
### 5. 音频文件下载
#### 获取生成的音频
```http
GET /audio/{filename}
```
## 💻 编程示例
### Python 完整示例
```python
import requests
import json
from pathlib import Path
class GenieTTSClient:
def __init__(self, base_url="http://localhost:8000"):
self.base_url = base_url.rstrip('/')
def health_check(self):
"""检查API健康状态"""
response = requests.get(f"{self.base_url}/health")
return response.json()
def basic_tts(self, text, character="misono_mika"):
"""基础文本转语音"""
data = {
"text": text,
"character": character,
"speed": 1.0,
"format": "wav"
}
response = requests.post(f"{self.base_url}/tts/synthesize", json=data)
result = response.json()
if result["success"]:
# 下载音频文件
audio_url = result["audio_url"]
audio_response = requests.get(f"{self.base_url}{audio_url}")
filename = f"tts_output_{result['timestamp'][:19].replace(':', '-')}.wav"
with open(filename, "wb") as f:
f.write(audio_response.content)
print(f"✅ TTS合成成功,保存为: {filename}")
return filename
else:
print(f"❌ TTS合成失败: {result.get('error', '未知错误')}")
return None
def analyze_audio(self, audio_path):
"""分析音频质量"""
params = {"audio_url": str(Path(audio_path).absolute())}
response = requests.post(f"{self.base_url}/voice-clone/analyze-audio", params=params)
return response.json()
def create_cloned_voice(self, voice_name, audio_path, reference_text, description=""):
"""创建克隆声音"""
data = {
"voice_name": voice_name,
"reference_audio_url": str(Path(audio_path).absolute()),
"reference_text": reference_text,
"description": description
}
response = requests.post(f"{self.base_url}/voice-clone/create", json=data)
result = response.json()
if result["success"]:
print(f"✅ 克隆声音创建成功: {voice_name}")
else:
print(f"❌ 克隆声音创建失败: {result.get('error', '未知错误')}")
return result
def clone_voice_tts(self, voice_name, text):
"""使用克隆声音进行合成"""
params = {"voice_name": voice_name, "text": text}
response = requests.post(f"{self.base_url}/voice-clone/synthesize", params=params)
result = response.json()
if result["success"]:
# 下载音频文件
audio_url = result["audio_url"]
audio_response = requests.get(f"{self.base_url}{audio_url}")
filename = f"cloned_voice_{voice_name}_{result['timestamp'][:19].replace(':', '-')}.wav"
with open(filename, "wb") as f:
f.write(audio_response.content)
print(f"✅ 克隆声音合成成功,保存为: {filename}")
return filename
else:
print(f"❌ 克隆声音合成失败: {result.get('error', '未知错误')}")
return None
def list_cloned_voices(self):
"""获取所有克隆声音"""
response = requests.get(f"{self.base_url}/voice-clone/list")
return response.json()
# 使用示例
def main():
client = GenieTTSClient()
# 1. 健康检查
print("1. 检查API状态...")
health = client.health_check()
print(f" 状态: {health['status']}")
print(f" 引擎: {health['engine_status']}")
print(f" 可用角色: {health['available_characters']}")
# 2. 基础TTS合成
print("\n2. 基础TTS合成...")
tts_file = client.basic_tts("こんにちは、今日はいい天気ですね。")
# 3. 语音克隆(需要准备参考音频)
reference_audio = "reference_voice.wav" # 替换为实际音频文件路径
reference_text = "参考音频中的日语文本" # 替换为实际文本
if Path(reference_audio).exists():
print(f"\n3. 分析参考音频: {reference_audio}")
analysis = client.analyze_audio(reference_audio)
print(f" 音频质量分数: {analysis.get('analysis', {}).get('quality_score', 'N/A')}")
print("\n4. 创建克隆声音...")
clone_result = client.create_cloned_voice(
voice_name="test_voice",
audio_path=reference_audio,
reference_text=reference_text,
description="测试用克隆声音"
)
if clone_result.get("success"):
print("\n5. 使用克隆声音合成...")
cloned_file = client.clone_voice_tts(
"test_voice",
"これはクローン音声のテストです。"
)
# 6. 查看所有克隆声音
print("\n6. 查看所有克隆声音...")
voices = client.list_cloned_voices()
if voices.get("success"):
print(f" 共有 {voices['count']} 个克隆声音:")
for name, info in voices.get("cloned_voices", {}).items():
print(f" - {name}: {info['description']}")
if __name__ == "__main__":
main()
```
### JavaScript/Node.js 示例
```javascript
const axios = require('axios');
class GenieTTSClient {
constructor(baseUrl = 'http://localhost:8000') {
this.baseUrl = baseUrl.replace(/\/+$/, '');
}
async healthCheck() {
const response = await axios.get(`${this.baseUrl}/health`);
return response.data;
}
async basicTTS(text, character = 'misono_mika') {
try {
const response = await axios.post(`${this.baseUrl}/tts/synthesize`, {
text: text,
character: character,
speed: 1.0,
format: 'wav'
});
if (response.data.success) {
console.log('✅ TTS合成成功:', response.data.audio_url);
return response.data;
} else {
console.error('❌ TTS合成失败:', response.data.error);
}
} catch (error) {
console.error('请求失败:', error.message);
}
}
async createClonedVoice(voiceName, audioPath, referenceText, description = '') {
try {
const response = await axios.post(`${this.baseUrl}/voice-clone/create`, {
voice_name: voiceName,
reference_audio_url: audioPath,
reference_text: referenceText,
description: description
});
if (response.data.success) {
console.log('✅ 克隆声音创建成功:', voiceName);
} else {
console.error('❌ 克隆声音创建失败:', response.data.error);
}
return response.data;
} catch (error) {
console.error('请求失败:', error.message);
}
}
async synthesizeWithClonedVoice(voiceName, text) {
try {
const response = await axios.post(`${this.baseUrl}/voice-clone/synthesize`, null, {
params: { voice_name: voiceName, text: text }
});
if (response.data.success) {
console.log('✅ 克隆声音合成成功:', response.data.audio_url);
return response.data;
} else {
console.error('❌ 克隆声音合成失败:', response.data.error);
}
} catch (error) {
console.error('请求失败:', error.message);
}
}
}
// 使用示例
async function main() {
const client = new GenieTTSClient();
// 1. 健康检查
console.log('1. 检查API状态...');
const health = await client.healthCheck();
console.log(' 状态:', health.status);
// 2. 基础TTS
console.log('\n2. 基础TTS合成...');
await client.basicTTS('こんにちは、世界!');
// 3. 语音克隆
console.log('\n3. 创建克隆声音...');
await client.createClonedVoice(
'test_voice',
'/path/to/reference.wav',
'参考音频中的文本',
'测试克隆声音'
);
// 4. 使用克隆声音
console.log('\n4. 使用克隆声音合成...');
await client.synthesizeWithClonedVoice('test_voice', 'テストです。');
}
main().catch(console.error);
```
## ⚠️ 重要注意事项
### 语音克隆要求
1. **音频质量**: 清晰、无噪音、22kHz以上采样率
2. **音频长度**: 3-30秒最佳
3. **语言**: 仅支持日语
4. **文本匹配**: 参考文本必须与音频内容完全匹配
### 性能考量
- **首次使用**: 需要下载模型(约30秒)
- **语音合成**: 每次5-10秒
- **克隆声音**: 创建时间2-5秒
- **并发处理**: 建议控制在5个以内
### 存储说明
- 音频文件临时存储在系统临时目录
- 克隆声音配置持久化保存
- 定期清理过期的音频文件
## 🔧 开发部署
### 本地开发
```bash
pip install -r requirements.txt
python api.py
```
### 生产部署
```bash
# 使用uvicorn
uvicorn api:api_app --host 0.0.0.0 --port 8000 --workers 1
# 使用gunicorn
pip install gunicorn[uvicorn]
gunicorn api:api_app -w 1 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000
```
### Docker 部署
```dockerfile
FROM python:3.10
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["python", "api.py"]
```
## 🐛 故障排除
### 常见问题
1. **引擎初始化失败**
- 检查onnxruntime是否正确安装
- 确认网络连接正常(需要下载模型)
2. **音频分析失败**
- 检查音频文件格式(支持wav, flac, ogg等)
- 确认文件路径正确且有读取权限
3. **语音克隆效果不好**
- 提高参考音频质量
- 确保参考文本准确匹配
- 尝试不同长度的参考音频
### 调试模式
```python
import logging
logging.basicConfig(level=logging.DEBUG)
```
## 📞 技术支持
- **官方仓库**: [Genie TTS](https://github.com/High-Logic/Genie)
- **问题反馈**: 提交GitHub Issue
- **讨论区**: GitHub Discussions
---
**版本**: 1.0.0 | **最后更新**: 2024年1月