GENIE

Sleeping

App Files Files Community

GENIE / API_USAGE_GUIDE.md

Tom1986

feat: Implement Genie TTS API with text-to-speech and voice cloning functionalities

9bc318e 9 months ago

preview code

raw

history blame contribute delete

14.3 kB

	# 🎵 Genie TTS API 完整使用指南

	## 🎯 概述

	Genie TTS API 是基于 GPT-SoVITS V2 架构的日语文本转语音服务，现在已经完全模块化，提供以下核心功能：

	- 🎤 基础TTS合成: 使用预训练角色进行语音合成
	- 🎭 语音克隆: 基于参考音频快速创建个性化声音
	- 📊 音频分析: 智能分析音频质量和特征
	- 🔄 批量处理: 高效的批量文本转语音

	## 🏗️ 模块化架构

	```
	genie/GENIE/
	├── api.py # 主API应用
	├── api_routes.py # 路由定义
	├── voice_cloning.py # 语音克隆核心功能
	├── models.py # 数据模型定义
	├── tts_engine.py # TTS引擎
	├── config.py # 配置文件
	└── requirements.txt # 依赖包
	```

	## 🚀 快速开始

	### 基础信息

	- API基础URL: `http://localhost:8000` (本地开发)
	- API文档: `http://localhost:8000/docs` (Swagger UI)
	- ReDoc文档: `http://localhost:8000/redoc`

	### 启动服务

	```bash
	# 方法1: 直接运行API服务
	python api.py

	# 方法2: 与Gradio集成运行
	python app.py # 如果app.py中集成了API
	```

	## 📚 API端点详细说明

	### 1. 基础功能

	#### 根信息
	```http
	GET /
	```

	响应示例:
	```json
	{
	"name": "Genie TTS API",
	"version": "1.0.0",
	"description": "高质量日语文本转语音API服务",
	"engine": "GPT-SoVITS V2 (ONNX)",
	"supported_languages": ["ja"],
	"features": ["TTS合成", "语音克隆", "角色管理"],
	"docs": "/docs"
	}
	```

	#### 健康检查
	```http
	GET /health
	```

	响应示例:
	```json
	{
	"status": "healthy",
	"version": "1.0.0",
	"engine_status": "ready",
	"available_characters": ["misono_mika"],
	"predefined_characters": 1,
	"custom_characters": 0,
	"cloned_voices": 2,
	"timestamp": "2024-01-01T12:00:00"
	}
	```

	### 2. 文本转语音 (TTS)

	#### 基础合成 (POST)
	```http
	POST /tts/synthesize
	Content-Type: application/json

	{
	"text": "こんにちは、世界！",
	"character": "misono_mika",
	"speed": 1.0,
	"format": "wav"
	}
	```

	#### 快速合成 (GET)
	```http
	GET /tts/synthesize?text=こんにちは&character=misono_mika
	```

	#### 批量合成
	```http
	POST /tts/batch-synthesize
	Content-Type: application/json

	[
	{"text": "おはようございます", "character": "misono_mika"},
	{"text": "こんばんは", "character": "misono_mika"}
	]
	```

	### 3. 语音克隆

	#### 分析参考音频
	```http
	POST /voice-clone/analyze-audio?audio_url=/path/to/audio.wav
	```

	响应示例:
	```json
	{
	"success": true,
	"analysis": {
	"duration": 5.2,
	"sample_rate": 22050,
	"quality_score": 0.85,
	"recommendations": []
	},
	"timestamp": "2024-01-01T12:00:00"
	}
	```

	#### 创建克隆声音
	```http
	POST /voice-clone/create
	Content-Type: application/json

	{
	"voice_name": "my_voice",
	"reference_audio_url": "/path/to/reference.wav",
	"reference_text": "参考音频中的日语文本内容",
	"description": "我的个人声音克隆"
	}
	```

	#### 获取克隆声音列表
	```http
	GET /voice-clone/list
	```

	响应示例:
	```json
	{
	"success": true,
	"message": "获取到 2 个克隆声音",
	"cloned_voices": {
	"my_voice": {
	"name": "my_voice",
	"description": "我的个人声音克隆",
	"reference_text": "参考音频中的日语文本...",
	"quality_score": 0.85,
	"duration": 5.2,
	"created_time": "2024-01-01T12:00:00"
	}
	},
	"count": 1,
	"timestamp": "2024-01-01T12:00:00"
	}
	```

	#### 使用克隆声音合成
	```http
	POST /voice-clone/synthesize?voice_name=my_voice&text=こんにちは、今日はいい天気ですね
	```

	#### 删除克隆声音
	```http
	DELETE /voice-clone/my_voice
	```

	### 4. 角色管理

	#### 获取所有角色
	```http
	GET /characters/
	```

	### 5. 音频文件下载

	#### 获取生成的音频
	```http
	GET /audio/{filename}
	```

	## 💻 编程示例

	### Python 完整示例

	```python
	import requests
	import json
	from pathlib import Path

	class GenieTTSClient:
	def __init__(self, base_url="http://localhost:8000"):
	self.base_url = base_url.rstrip('/')

	def health_check(self):
	"""检查API健康状态"""
	response = requests.get(f"{self.base_url}/health")
	return response.json()

	def basic_tts(self, text, character="misono_mika"):
	"""基础文本转语音"""
	data = {
	"text": text,
	"character": character,
	"speed": 1.0,
	"format": "wav"
	}

	response = requests.post(f"{self.base_url}/tts/synthesize", json=data)
	result = response.json()

	if result["success"]:
	# 下载音频文件
	audio_url = result["audio_url"]
	audio_response = requests.get(f"{self.base_url}{audio_url}")

	filename = f"tts_output_{result['timestamp'][:19].replace(':', '-')}.wav"
	with open(filename, "wb") as f:
	f.write(audio_response.content)

	print(f"✅ TTS合成成功，保存为: {filename}")
	return filename
	else:
	print(f"❌ TTS合成失败: {result.get('error', '未知错误')}")
	return None

	def analyze_audio(self, audio_path):
	"""分析音频质量"""
	params = {"audio_url": str(Path(audio_path).absolute())}
	response = requests.post(f"{self.base_url}/voice-clone/analyze-audio", params=params)
	return response.json()

	def create_cloned_voice(self, voice_name, audio_path, reference_text, description=""):
	"""创建克隆声音"""
	data = {
	"voice_name": voice_name,
	"reference_audio_url": str(Path(audio_path).absolute()),
	"reference_text": reference_text,
	"description": description
	}

	response = requests.post(f"{self.base_url}/voice-clone/create", json=data)
	result = response.json()

	if result["success"]:
	print(f"✅ 克隆声音创建成功: {voice_name}")
	else:
	print(f"❌ 克隆声音创建失败: {result.get('error', '未知错误')}")

	return result

	def clone_voice_tts(self, voice_name, text):
	"""使用克隆声音进行合成"""
	params = {"voice_name": voice_name, "text": text}
	response = requests.post(f"{self.base_url}/voice-clone/synthesize", params=params)
	result = response.json()

	if result["success"]:
	# 下载音频文件
	audio_url = result["audio_url"]
	audio_response = requests.get(f"{self.base_url}{audio_url}")

	filename = f"cloned_voice_{voice_name}_{result['timestamp'][:19].replace(':', '-')}.wav"
	with open(filename, "wb") as f:
	f.write(audio_response.content)

	print(f"✅ 克隆声音合成成功，保存为: {filename}")
	return filename
	else:
	print(f"❌ 克隆声音合成失败: {result.get('error', '未知错误')}")
	return None

	def list_cloned_voices(self):
	"""获取所有克隆声音"""
	response = requests.get(f"{self.base_url}/voice-clone/list")
	return response.json()


	# 使用示例
	def main():
	client = GenieTTSClient()

	# 1. 健康检查
	print("1. 检查API状态...")
	health = client.health_check()
	print(f" 状态: {health['status']}")
	print(f" 引擎: {health['engine_status']}")
	print(f" 可用角色: {health['available_characters']}")

	# 2. 基础TTS合成
	print("\n2. 基础TTS合成...")
	tts_file = client.basic_tts("こんにちは、今日はいい天気ですね。")

	# 3. 语音克隆（需要准备参考音频）
	reference_audio = "reference_voice.wav" # 替换为实际音频文件路径
	reference_text = "参考音频中的日语文本" # 替换为实际文本

	if Path(reference_audio).exists():
	print(f"\n3. 分析参考音频: {reference_audio}")
	analysis = client.analyze_audio(reference_audio)
	print(f" 音频质量分数: {analysis.get('analysis', {}).get('quality_score', 'N/A')}")

	print("\n4. 创建克隆声音...")
	clone_result = client.create_cloned_voice(
	voice_name="test_voice",
	audio_path=reference_audio,
	reference_text=reference_text,
	description="测试用克隆声音"
	)

	if clone_result.get("success"):
	print("\n5. 使用克隆声音合成...")
	cloned_file = client.clone_voice_tts(
	"test_voice",
	"これはクローン音声のテストです。"
	)

	# 6. 查看所有克隆声音
	print("\n6. 查看所有克隆声音...")
	voices = client.list_cloned_voices()
	if voices.get("success"):
	print(f" 共有 {voices['count']} 个克隆声音:")
	for name, info in voices.get("cloned_voices", {}).items():
	print(f" - {name}: {info['description']}")


	if __name__ == "__main__":
	main()
	```

	### JavaScript/Node.js 示例

	```javascript
	const axios = require('axios');

	class GenieTTSClient {
	constructor(baseUrl = 'http://localhost:8000') {
	this.baseUrl = baseUrl.replace(/\/+$/, '');
	}

	async healthCheck() {
	const response = await axios.get(`${this.baseUrl}/health`);
	return response.data;
	}

	async basicTTS(text, character = 'misono_mika') {
	try {
	const response = await axios.post(`${this.baseUrl}/tts/synthesize`, {
	text: text,
	character: character,
	speed: 1.0,
	format: 'wav'
	});

	if (response.data.success) {
	console.log('✅ TTS合成成功:', response.data.audio_url);
	return response.data;
	} else {
	console.error('❌ TTS合成失败:', response.data.error);
	}
	} catch (error) {
	console.error('请求失败:', error.message);
	}
	}

	async createClonedVoice(voiceName, audioPath, referenceText, description = '') {
	try {
	const response = await axios.post(`${this.baseUrl}/voice-clone/create`, {
	voice_name: voiceName,
	reference_audio_url: audioPath,
	reference_text: referenceText,
	description: description
	});

	if (response.data.success) {
	console.log('✅ 克隆声音创建成功:', voiceName);
	} else {
	console.error('❌ 克隆声音创建失败:', response.data.error);
	}

	return response.data;
	} catch (error) {
	console.error('请求失败:', error.message);
	}
	}

	async synthesizeWithClonedVoice(voiceName, text) {
	try {
	const response = await axios.post(`${this.baseUrl}/voice-clone/synthesize`, null, {
	params: { voice_name: voiceName, text: text }
	});

	if (response.data.success) {
	console.log('✅ 克隆声音合成成功:', response.data.audio_url);
	return response.data;
	} else {
	console.error('❌ 克隆声音合成失败:', response.data.error);
	}
	} catch (error) {
	console.error('请求失败:', error.message);
	}
	}
	}

	// 使用示例
	async function main() {
	const client = new GenieTTSClient();

	// 1. 健康检查
	console.log('1. 检查API状态...');
	const health = await client.healthCheck();
	console.log(' 状态:', health.status);

	// 2. 基础TTS
	console.log('\n2. 基础TTS合成...');
	await client.basicTTS('こんにちは、世界！');

	// 3. 语音克隆
	console.log('\n3. 创建克隆声音...');
	await client.createClonedVoice(
	'test_voice',
	'/path/to/reference.wav',
	'参考音频中的文本',
	'测试克隆声音'
	);

	// 4. 使用克隆声音
	console.log('\n4. 使用克隆声音合成...');
	await client.synthesizeWithClonedVoice('test_voice', 'テストです。');
	}

	main().catch(console.error);
	```

	## ⚠️ 重要注意事项

	### 语音克隆要求
	1. 音频质量: 清晰、无噪音、22kHz以上采样率
	2. 音频长度: 3-30秒最佳
	3. 语言: 仅支持日语
	4. 文本匹配: 参考文本必须与音频内容完全匹配

	### 性能考量
	- 首次使用: 需要下载模型（约30秒）
	- 语音合成: 每次5-10秒
	- 克隆声音: 创建时间2-5秒
	- 并发处理: 建议控制在5个以内

	### 存储说明
	- 音频文件临时存储在系统临时目录
	- 克隆声音配置持久化保存
	- 定期清理过期的音频文件

	## 🔧 开发部署

	### 本地开发
	```bash
	pip install -r requirements.txt
	python api.py
	```

	### 生产部署
	```bash
	# 使用uvicorn
	uvicorn api:api_app --host 0.0.0.0 --port 8000 --workers 1

	# 使用gunicorn
	pip install gunicorn[uvicorn]
	gunicorn api:api_app -w 1 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000
	```

	### Docker 部署
	```dockerfile
	FROM python:3.10
	WORKDIR /app
	COPY requirements.txt .
	RUN pip install -r requirements.txt
	COPY . .
	EXPOSE 8000
	CMD ["python", "api.py"]
	```

	## 🐛 故障排除

	### 常见问题

	1. 引擎初始化失败
	- 检查onnxruntime是否正确安装
	- 确认网络连接正常（需要下载模型）

	2. 音频分析失败
	- 检查音频文件格式（支持wav, flac, ogg等）
	- 确认文件路径正确且有读取权限

	3. 语音克隆效果不好
	- 提高参考音频质量
	- 确保参考文本准确匹配
	- 尝试不同长度的参考音频

	### 调试模式
	```python
	import logging
	logging.basicConfig(level=logging.DEBUG)
	```

	## 📞 技术支持

	- 官方仓库: [Genie TTS](https://github.com/High-Logic/Genie)
	- 问题反馈: 提交GitHub Issue
	- 讨论区: GitHub Discussions

	---

	版本: 1.0.0 \| 最后更新: 2024年1月