Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available: 6.15.2
🎵 Genie TTS API 完整使用指南
🎯 概述
Genie TTS API 是基于 GPT-SoVITS V2 架构的日语文本转语音服务,现在已经完全模块化,提供以下核心功能:
- 🎤 基础TTS合成: 使用预训练角色进行语音合成
- 🎭 语音克隆: 基于参考音频快速创建个性化声音
- 📊 音频分析: 智能分析音频质量和特征
- 🔄 批量处理: 高效的批量文本转语音
🏗️ 模块化架构
genie/GENIE/
├── api.py # 主API应用
├── api_routes.py # 路由定义
├── voice_cloning.py # 语音克隆核心功能
├── models.py # 数据模型定义
├── tts_engine.py # TTS引擎
├── config.py # 配置文件
└── requirements.txt # 依赖包
🚀 快速开始
基础信息
- API基础URL:
http://localhost:8000(本地开发) - API文档:
http://localhost:8000/docs(Swagger UI) - ReDoc文档:
http://localhost:8000/redoc
启动服务
# 方法1: 直接运行API服务
python api.py
# 方法2: 与Gradio集成运行
python app.py # 如果app.py中集成了API
📚 API端点详细说明
1. 基础功能
根信息
GET /
响应示例:
{
"name": "Genie TTS API",
"version": "1.0.0",
"description": "高质量日语文本转语音API服务",
"engine": "GPT-SoVITS V2 (ONNX)",
"supported_languages": ["ja"],
"features": ["TTS合成", "语音克隆", "角色管理"],
"docs": "/docs"
}
健康检查
GET /health
响应示例:
{
"status": "healthy",
"version": "1.0.0",
"engine_status": "ready",
"available_characters": ["misono_mika"],
"predefined_characters": 1,
"custom_characters": 0,
"cloned_voices": 2,
"timestamp": "2024-01-01T12:00:00"
}
2. 文本转语音 (TTS)
基础合成 (POST)
POST /tts/synthesize
Content-Type: application/json
{
"text": "こんにちは、世界!",
"character": "misono_mika",
"speed": 1.0,
"format": "wav"
}
快速合成 (GET)
GET /tts/synthesize?text=こんにちは&character=misono_mika
批量合成
POST /tts/batch-synthesize
Content-Type: application/json
[
{"text": "おはようございます", "character": "misono_mika"},
{"text": "こんばんは", "character": "misono_mika"}
]
3. 语音克隆
分析参考音频
POST /voice-clone/analyze-audio?audio_url=/path/to/audio.wav
响应示例:
{
"success": true,
"analysis": {
"duration": 5.2,
"sample_rate": 22050,
"quality_score": 0.85,
"recommendations": []
},
"timestamp": "2024-01-01T12:00:00"
}
创建克隆声音
POST /voice-clone/create
Content-Type: application/json
{
"voice_name": "my_voice",
"reference_audio_url": "/path/to/reference.wav",
"reference_text": "参考音频中的日语文本内容",
"description": "我的个人声音克隆"
}
获取克隆声音列表
GET /voice-clone/list
响应示例:
{
"success": true,
"message": "获取到 2 个克隆声音",
"cloned_voices": {
"my_voice": {
"name": "my_voice",
"description": "我的个人声音克隆",
"reference_text": "参考音频中的日语文本...",
"quality_score": 0.85,
"duration": 5.2,
"created_time": "2024-01-01T12:00:00"
}
},
"count": 1,
"timestamp": "2024-01-01T12:00:00"
}
使用克隆声音合成
POST /voice-clone/synthesize?voice_name=my_voice&text=こんにちは、今日はいい天気ですね
删除克隆声音
DELETE /voice-clone/my_voice
4. 角色管理
获取所有角色
GET /characters/
5. 音频文件下载
获取生成的音频
GET /audio/{filename}
💻 编程示例
Python 完整示例
import requests
import json
from pathlib import Path
class GenieTTSClient:
def __init__(self, base_url="http://localhost:8000"):
self.base_url = base_url.rstrip('/')
def health_check(self):
"""检查API健康状态"""
response = requests.get(f"{self.base_url}/health")
return response.json()
def basic_tts(self, text, character="misono_mika"):
"""基础文本转语音"""
data = {
"text": text,
"character": character,
"speed": 1.0,
"format": "wav"
}
response = requests.post(f"{self.base_url}/tts/synthesize", json=data)
result = response.json()
if result["success"]:
# 下载音频文件
audio_url = result["audio_url"]
audio_response = requests.get(f"{self.base_url}{audio_url}")
filename = f"tts_output_{result['timestamp'][:19].replace(':', '-')}.wav"
with open(filename, "wb") as f:
f.write(audio_response.content)
print(f"✅ TTS合成成功,保存为: {filename}")
return filename
else:
print(f"❌ TTS合成失败: {result.get('error', '未知错误')}")
return None
def analyze_audio(self, audio_path):
"""分析音频质量"""
params = {"audio_url": str(Path(audio_path).absolute())}
response = requests.post(f"{self.base_url}/voice-clone/analyze-audio", params=params)
return response.json()
def create_cloned_voice(self, voice_name, audio_path, reference_text, description=""):
"""创建克隆声音"""
data = {
"voice_name": voice_name,
"reference_audio_url": str(Path(audio_path).absolute()),
"reference_text": reference_text,
"description": description
}
response = requests.post(f"{self.base_url}/voice-clone/create", json=data)
result = response.json()
if result["success"]:
print(f"✅ 克隆声音创建成功: {voice_name}")
else:
print(f"❌ 克隆声音创建失败: {result.get('error', '未知错误')}")
return result
def clone_voice_tts(self, voice_name, text):
"""使用克隆声音进行合成"""
params = {"voice_name": voice_name, "text": text}
response = requests.post(f"{self.base_url}/voice-clone/synthesize", params=params)
result = response.json()
if result["success"]:
# 下载音频文件
audio_url = result["audio_url"]
audio_response = requests.get(f"{self.base_url}{audio_url}")
filename = f"cloned_voice_{voice_name}_{result['timestamp'][:19].replace(':', '-')}.wav"
with open(filename, "wb") as f:
f.write(audio_response.content)
print(f"✅ 克隆声音合成成功,保存为: {filename}")
return filename
else:
print(f"❌ 克隆声音合成失败: {result.get('error', '未知错误')}")
return None
def list_cloned_voices(self):
"""获取所有克隆声音"""
response = requests.get(f"{self.base_url}/voice-clone/list")
return response.json()
# 使用示例
def main():
client = GenieTTSClient()
# 1. 健康检查
print("1. 检查API状态...")
health = client.health_check()
print(f" 状态: {health['status']}")
print(f" 引擎: {health['engine_status']}")
print(f" 可用角色: {health['available_characters']}")
# 2. 基础TTS合成
print("\n2. 基础TTS合成...")
tts_file = client.basic_tts("こんにちは、今日はいい天気ですね。")
# 3. 语音克隆(需要准备参考音频)
reference_audio = "reference_voice.wav" # 替换为实际音频文件路径
reference_text = "参考音频中的日语文本" # 替换为实际文本
if Path(reference_audio).exists():
print(f"\n3. 分析参考音频: {reference_audio}")
analysis = client.analyze_audio(reference_audio)
print(f" 音频质量分数: {analysis.get('analysis', {}).get('quality_score', 'N/A')}")
print("\n4. 创建克隆声音...")
clone_result = client.create_cloned_voice(
voice_name="test_voice",
audio_path=reference_audio,
reference_text=reference_text,
description="测试用克隆声音"
)
if clone_result.get("success"):
print("\n5. 使用克隆声音合成...")
cloned_file = client.clone_voice_tts(
"test_voice",
"これはクローン音声のテストです。"
)
# 6. 查看所有克隆声音
print("\n6. 查看所有克隆声音...")
voices = client.list_cloned_voices()
if voices.get("success"):
print(f" 共有 {voices['count']} 个克隆声音:")
for name, info in voices.get("cloned_voices", {}).items():
print(f" - {name}: {info['description']}")
if __name__ == "__main__":
main()
JavaScript/Node.js 示例
const axios = require('axios');
class GenieTTSClient {
constructor(baseUrl = 'http://localhost:8000') {
this.baseUrl = baseUrl.replace(/\/+$/, '');
}
async healthCheck() {
const response = await axios.get(`${this.baseUrl}/health`);
return response.data;
}
async basicTTS(text, character = 'misono_mika') {
try {
const response = await axios.post(`${this.baseUrl}/tts/synthesize`, {
text: text,
character: character,
speed: 1.0,
format: 'wav'
});
if (response.data.success) {
console.log('✅ TTS合成成功:', response.data.audio_url);
return response.data;
} else {
console.error('❌ TTS合成失败:', response.data.error);
}
} catch (error) {
console.error('请求失败:', error.message);
}
}
async createClonedVoice(voiceName, audioPath, referenceText, description = '') {
try {
const response = await axios.post(`${this.baseUrl}/voice-clone/create`, {
voice_name: voiceName,
reference_audio_url: audioPath,
reference_text: referenceText,
description: description
});
if (response.data.success) {
console.log('✅ 克隆声音创建成功:', voiceName);
} else {
console.error('❌ 克隆声音创建失败:', response.data.error);
}
return response.data;
} catch (error) {
console.error('请求失败:', error.message);
}
}
async synthesizeWithClonedVoice(voiceName, text) {
try {
const response = await axios.post(`${this.baseUrl}/voice-clone/synthesize`, null, {
params: { voice_name: voiceName, text: text }
});
if (response.data.success) {
console.log('✅ 克隆声音合成成功:', response.data.audio_url);
return response.data;
} else {
console.error('❌ 克隆声音合成失败:', response.data.error);
}
} catch (error) {
console.error('请求失败:', error.message);
}
}
}
// 使用示例
async function main() {
const client = new GenieTTSClient();
// 1. 健康检查
console.log('1. 检查API状态...');
const health = await client.healthCheck();
console.log(' 状态:', health.status);
// 2. 基础TTS
console.log('\n2. 基础TTS合成...');
await client.basicTTS('こんにちは、世界!');
// 3. 语音克隆
console.log('\n3. 创建克隆声音...');
await client.createClonedVoice(
'test_voice',
'/path/to/reference.wav',
'参考音频中的文本',
'测试克隆声音'
);
// 4. 使用克隆声音
console.log('\n4. 使用克隆声音合成...');
await client.synthesizeWithClonedVoice('test_voice', 'テストです。');
}
main().catch(console.error);
⚠️ 重要注意事项
语音克隆要求
- 音频质量: 清晰、无噪音、22kHz以上采样率
- 音频长度: 3-30秒最佳
- 语言: 仅支持日语
- 文本匹配: 参考文本必须与音频内容完全匹配
性能考量
- 首次使用: 需要下载模型(约30秒)
- 语音合成: 每次5-10秒
- 克隆声音: 创建时间2-5秒
- 并发处理: 建议控制在5个以内
存储说明
- 音频文件临时存储在系统临时目录
- 克隆声音配置持久化保存
- 定期清理过期的音频文件
🔧 开发部署
本地开发
pip install -r requirements.txt
python api.py
生产部署
# 使用uvicorn
uvicorn api:api_app --host 0.0.0.0 --port 8000 --workers 1
# 使用gunicorn
pip install gunicorn[uvicorn]
gunicorn api:api_app -w 1 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000
Docker 部署
FROM python:3.10
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["python", "api.py"]
🐛 故障排除
常见问题
引擎初始化失败
- 检查onnxruntime是否正确安装
- 确认网络连接正常(需要下载模型)
音频分析失败
- 检查音频文件格式(支持wav, flac, ogg等)
- 确认文件路径正确且有读取权限
语音克隆效果不好
- 提高参考音频质量
- 确保参考文本准确匹配
- 尝试不同长度的参考音频
调试模式
import logging
logging.basicConfig(level=logging.DEBUG)
📞 技术支持
- 官方仓库: Genie TTS
- 问题反馈: 提交GitHub Issue
- 讨论区: GitHub Discussions
版本: 1.0.0 | 最后更新: 2024年1月