GENIE / API_USAGE_GUIDE.md
Tom1986's picture
feat: Implement Genie TTS API with text-to-speech and voice cloning functionalities
9bc318e

A newer version of the Gradio SDK is available: 6.15.2

Upgrade

🎵 Genie TTS API 完整使用指南

🎯 概述

Genie TTS API 是基于 GPT-SoVITS V2 架构的日语文本转语音服务,现在已经完全模块化,提供以下核心功能:

  • 🎤 基础TTS合成: 使用预训练角色进行语音合成
  • 🎭 语音克隆: 基于参考音频快速创建个性化声音
  • 📊 音频分析: 智能分析音频质量和特征
  • 🔄 批量处理: 高效的批量文本转语音

🏗️ 模块化架构

genie/GENIE/
├── api.py              # 主API应用
├── api_routes.py       # 路由定义
├── voice_cloning.py    # 语音克隆核心功能
├── models.py           # 数据模型定义
├── tts_engine.py       # TTS引擎
├── config.py           # 配置文件
└── requirements.txt    # 依赖包

🚀 快速开始

基础信息

  • API基础URL: http://localhost:8000 (本地开发)
  • API文档: http://localhost:8000/docs (Swagger UI)
  • ReDoc文档: http://localhost:8000/redoc

启动服务

# 方法1: 直接运行API服务
python api.py

# 方法2: 与Gradio集成运行
python app.py  # 如果app.py中集成了API

📚 API端点详细说明

1. 基础功能

根信息

GET /

响应示例:

{
  "name": "Genie TTS API",
  "version": "1.0.0",
  "description": "高质量日语文本转语音API服务",
  "engine": "GPT-SoVITS V2 (ONNX)",
  "supported_languages": ["ja"],
  "features": ["TTS合成", "语音克隆", "角色管理"],
  "docs": "/docs"
}

健康检查

GET /health

响应示例:

{
  "status": "healthy",
  "version": "1.0.0",
  "engine_status": "ready",
  "available_characters": ["misono_mika"],
  "predefined_characters": 1,
  "custom_characters": 0,
  "cloned_voices": 2,
  "timestamp": "2024-01-01T12:00:00"
}

2. 文本转语音 (TTS)

基础合成 (POST)

POST /tts/synthesize
Content-Type: application/json

{
  "text": "こんにちは、世界!",
  "character": "misono_mika",
  "speed": 1.0,
  "format": "wav"
}

快速合成 (GET)

GET /tts/synthesize?text=こんにちは&character=misono_mika

批量合成

POST /tts/batch-synthesize
Content-Type: application/json

[
  {"text": "おはようございます", "character": "misono_mika"},
  {"text": "こんばんは", "character": "misono_mika"}
]

3. 语音克隆

分析参考音频

POST /voice-clone/analyze-audio?audio_url=/path/to/audio.wav

响应示例:

{
  "success": true,
  "analysis": {
    "duration": 5.2,
    "sample_rate": 22050,
    "quality_score": 0.85,
    "recommendations": []
  },
  "timestamp": "2024-01-01T12:00:00"
}

创建克隆声音

POST /voice-clone/create
Content-Type: application/json

{
  "voice_name": "my_voice",
  "reference_audio_url": "/path/to/reference.wav",
  "reference_text": "参考音频中的日语文本内容",
  "description": "我的个人声音克隆"
}

获取克隆声音列表

GET /voice-clone/list

响应示例:

{
  "success": true,
  "message": "获取到 2 个克隆声音",
  "cloned_voices": {
    "my_voice": {
      "name": "my_voice",
      "description": "我的个人声音克隆",
      "reference_text": "参考音频中的日语文本...",
      "quality_score": 0.85,
      "duration": 5.2,
      "created_time": "2024-01-01T12:00:00"
    }
  },
  "count": 1,
  "timestamp": "2024-01-01T12:00:00"
}

使用克隆声音合成

POST /voice-clone/synthesize?voice_name=my_voice&text=こんにちは、今日はいい天気ですね

删除克隆声音

DELETE /voice-clone/my_voice

4. 角色管理

获取所有角色

GET /characters/

5. 音频文件下载

获取生成的音频

GET /audio/{filename}

💻 编程示例

Python 完整示例

import requests
import json
from pathlib import Path

class GenieTTSClient:
    def __init__(self, base_url="http://localhost:8000"):
        self.base_url = base_url.rstrip('/')
        
    def health_check(self):
        """检查API健康状态"""
        response = requests.get(f"{self.base_url}/health")
        return response.json()
    
    def basic_tts(self, text, character="misono_mika"):
        """基础文本转语音"""
        data = {
            "text": text,
            "character": character,
            "speed": 1.0,
            "format": "wav"
        }
        
        response = requests.post(f"{self.base_url}/tts/synthesize", json=data)
        result = response.json()
        
        if result["success"]:
            # 下载音频文件
            audio_url = result["audio_url"]
            audio_response = requests.get(f"{self.base_url}{audio_url}")
            
            filename = f"tts_output_{result['timestamp'][:19].replace(':', '-')}.wav"
            with open(filename, "wb") as f:
                f.write(audio_response.content)
            
            print(f"✅ TTS合成成功,保存为: {filename}")
            return filename
        else:
            print(f"❌ TTS合成失败: {result.get('error', '未知错误')}")
            return None
    
    def analyze_audio(self, audio_path):
        """分析音频质量"""
        params = {"audio_url": str(Path(audio_path).absolute())}
        response = requests.post(f"{self.base_url}/voice-clone/analyze-audio", params=params)
        return response.json()
    
    def create_cloned_voice(self, voice_name, audio_path, reference_text, description=""):
        """创建克隆声音"""
        data = {
            "voice_name": voice_name,
            "reference_audio_url": str(Path(audio_path).absolute()),
            "reference_text": reference_text,
            "description": description
        }
        
        response = requests.post(f"{self.base_url}/voice-clone/create", json=data)
        result = response.json()
        
        if result["success"]:
            print(f"✅ 克隆声音创建成功: {voice_name}")
        else:
            print(f"❌ 克隆声音创建失败: {result.get('error', '未知错误')}")
        
        return result
    
    def clone_voice_tts(self, voice_name, text):
        """使用克隆声音进行合成"""
        params = {"voice_name": voice_name, "text": text}
        response = requests.post(f"{self.base_url}/voice-clone/synthesize", params=params)
        result = response.json()
        
        if result["success"]:
            # 下载音频文件
            audio_url = result["audio_url"]
            audio_response = requests.get(f"{self.base_url}{audio_url}")
            
            filename = f"cloned_voice_{voice_name}_{result['timestamp'][:19].replace(':', '-')}.wav"
            with open(filename, "wb") as f:
                f.write(audio_response.content)
            
            print(f"✅ 克隆声音合成成功,保存为: {filename}")
            return filename
        else:
            print(f"❌ 克隆声音合成失败: {result.get('error', '未知错误')}")
            return None
    
    def list_cloned_voices(self):
        """获取所有克隆声音"""
        response = requests.get(f"{self.base_url}/voice-clone/list")
        return response.json()


# 使用示例
def main():
    client = GenieTTSClient()
    
    # 1. 健康检查
    print("1. 检查API状态...")
    health = client.health_check()
    print(f"   状态: {health['status']}")
    print(f"   引擎: {health['engine_status']}")
    print(f"   可用角色: {health['available_characters']}")
    
    # 2. 基础TTS合成
    print("\n2. 基础TTS合成...")
    tts_file = client.basic_tts("こんにちは、今日はいい天気ですね。")
    
    # 3. 语音克隆(需要准备参考音频)
    reference_audio = "reference_voice.wav"  # 替换为实际音频文件路径
    reference_text = "参考音频中的日语文本"    # 替换为实际文本
    
    if Path(reference_audio).exists():
        print(f"\n3. 分析参考音频: {reference_audio}")
        analysis = client.analyze_audio(reference_audio)
        print(f"   音频质量分数: {analysis.get('analysis', {}).get('quality_score', 'N/A')}")
        
        print("\n4. 创建克隆声音...")
        clone_result = client.create_cloned_voice(
            voice_name="test_voice",
            audio_path=reference_audio,
            reference_text=reference_text,
            description="测试用克隆声音"
        )
        
        if clone_result.get("success"):
            print("\n5. 使用克隆声音合成...")
            cloned_file = client.clone_voice_tts(
                "test_voice", 
                "これはクローン音声のテストです。"
            )
    
    # 6. 查看所有克隆声音
    print("\n6. 查看所有克隆声音...")
    voices = client.list_cloned_voices()
    if voices.get("success"):
        print(f"   共有 {voices['count']} 个克隆声音:")
        for name, info in voices.get("cloned_voices", {}).items():
            print(f"   - {name}: {info['description']}")


if __name__ == "__main__":
    main()

JavaScript/Node.js 示例

const axios = require('axios');

class GenieTTSClient {
    constructor(baseUrl = 'http://localhost:8000') {
        this.baseUrl = baseUrl.replace(/\/+$/, '');
    }
    
    async healthCheck() {
        const response = await axios.get(`${this.baseUrl}/health`);
        return response.data;
    }
    
    async basicTTS(text, character = 'misono_mika') {
        try {
            const response = await axios.post(`${this.baseUrl}/tts/synthesize`, {
                text: text,
                character: character,
                speed: 1.0,
                format: 'wav'
            });
            
            if (response.data.success) {
                console.log('✅ TTS合成成功:', response.data.audio_url);
                return response.data;
            } else {
                console.error('❌ TTS合成失败:', response.data.error);
            }
        } catch (error) {
            console.error('请求失败:', error.message);
        }
    }
    
    async createClonedVoice(voiceName, audioPath, referenceText, description = '') {
        try {
            const response = await axios.post(`${this.baseUrl}/voice-clone/create`, {
                voice_name: voiceName,
                reference_audio_url: audioPath,
                reference_text: referenceText,
                description: description
            });
            
            if (response.data.success) {
                console.log('✅ 克隆声音创建成功:', voiceName);
            } else {
                console.error('❌ 克隆声音创建失败:', response.data.error);
            }
            
            return response.data;
        } catch (error) {
            console.error('请求失败:', error.message);
        }
    }
    
    async synthesizeWithClonedVoice(voiceName, text) {
        try {
            const response = await axios.post(`${this.baseUrl}/voice-clone/synthesize`, null, {
                params: { voice_name: voiceName, text: text }
            });
            
            if (response.data.success) {
                console.log('✅ 克隆声音合成成功:', response.data.audio_url);
                return response.data;
            } else {
                console.error('❌ 克隆声音合成失败:', response.data.error);
            }
        } catch (error) {
            console.error('请求失败:', error.message);
        }
    }
}

// 使用示例
async function main() {
    const client = new GenieTTSClient();
    
    // 1. 健康检查
    console.log('1. 检查API状态...');
    const health = await client.healthCheck();
    console.log('   状态:', health.status);
    
    // 2. 基础TTS
    console.log('\n2. 基础TTS合成...');
    await client.basicTTS('こんにちは、世界!');
    
    // 3. 语音克隆
    console.log('\n3. 创建克隆声音...');
    await client.createClonedVoice(
        'test_voice',
        '/path/to/reference.wav',
        '参考音频中的文本',
        '测试克隆声音'
    );
    
    // 4. 使用克隆声音
    console.log('\n4. 使用克隆声音合成...');
    await client.synthesizeWithClonedVoice('test_voice', 'テストです。');
}

main().catch(console.error);

⚠️ 重要注意事项

语音克隆要求

  1. 音频质量: 清晰、无噪音、22kHz以上采样率
  2. 音频长度: 3-30秒最佳
  3. 语言: 仅支持日语
  4. 文本匹配: 参考文本必须与音频内容完全匹配

性能考量

  • 首次使用: 需要下载模型(约30秒)
  • 语音合成: 每次5-10秒
  • 克隆声音: 创建时间2-5秒
  • 并发处理: 建议控制在5个以内

存储说明

  • 音频文件临时存储在系统临时目录
  • 克隆声音配置持久化保存
  • 定期清理过期的音频文件

🔧 开发部署

本地开发

pip install -r requirements.txt
python api.py

生产部署

# 使用uvicorn
uvicorn api:api_app --host 0.0.0.0 --port 8000 --workers 1

# 使用gunicorn
pip install gunicorn[uvicorn]
gunicorn api:api_app -w 1 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000

Docker 部署

FROM python:3.10
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["python", "api.py"]

🐛 故障排除

常见问题

  1. 引擎初始化失败

    • 检查onnxruntime是否正确安装
    • 确认网络连接正常(需要下载模型)
  2. 音频分析失败

    • 检查音频文件格式(支持wav, flac, ogg等)
    • 确认文件路径正确且有读取权限
  3. 语音克隆效果不好

    • 提高参考音频质量
    • 确保参考文本准确匹配
    • 尝试不同长度的参考音频

调试模式

import logging
logging.basicConfig(level=logging.DEBUG)

📞 技术支持

  • 官方仓库: Genie TTS
  • 问题反馈: 提交GitHub Issue
  • 讨论区: GitHub Discussions

版本: 1.0.0 | 最后更新: 2024年1月