22333Misaka commited on
Commit
a287c67
·
verified ·
1 Parent(s): 4cf4281

Upload 5 files

Browse files
Files changed (5) hide show
  1. DEPLOYMENT.md +179 -0
  2. Dockerfile +102 -0
  3. README.md +153 -5
  4. app.py +376 -0
  5. requirements.txt +20 -0
DEPLOYMENT.md ADDED
@@ -0,0 +1,179 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🚀 Hugging Face Space 部署指南
2
+
3
+ 本指南将帮助你将 Genie-TTS OpenAI 兼容 API 部署到 Hugging Face Space。
4
+
5
+ ## 📋 前置要求
6
+
7
+ 1. Hugging Face 账号
8
+ 2. 模型文件的下载链接(.pth 和 .ckpt)
9
+ 3. 参考音频文件
10
+
11
+ ## 🔧 部署步骤
12
+
13
+ ### 步骤 1: 创建 Hugging Face Space
14
+
15
+ 1. 访问 [Hugging Face Spaces](https://huggingface.co/spaces)
16
+ 2. 点击 "Create new Space"
17
+ 3. 填写信息:
18
+ - **Space name**: 选择一个名称,如 `genie-tts-api`
19
+ - **License**: MIT
20
+ - **SDK**: Docker
21
+ - **Hardware**: CPU Basic(免费)或更高配置
22
+ 4. 点击 "Create Space"
23
+
24
+ ### 步骤 2: 上传文件
25
+
26
+ 将以下文件上传到你的 Space 仓库:
27
+
28
+ ```
29
+ ├── Dockerfile
30
+ ├── app.py
31
+ ├── requirements.txt
32
+ ├── README.md
33
+ └── models/
34
+ └── liang/
35
+ └── config.json
36
+ ```
37
+
38
+ **方法一:使用 Git**
39
+
40
+ ```bash
41
+ # 克隆你的 Space 仓库
42
+ git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
43
+ cd YOUR_SPACE_NAME
44
+
45
+ # 复制文件
46
+ cp -r /path/to/huggingface-space/* .
47
+
48
+ # 提交并推送
49
+ git add .
50
+ git commit -m "Initial deployment"
51
+ git push
52
+ ```
53
+
54
+ **方法二:使用 Web 界面**
55
+
56
+ 1. 在 Space 页面点击 "Files" 标签
57
+ 2. 点击 "Add file" > "Upload files"
58
+ 3. 上传所有必要文件
59
+
60
+ ### 步骤 3: 等待构建
61
+
62
+ - Space 会自动开始构建 Docker 镜像
63
+ - 构建过程包括:
64
+ 1. 安装依赖
65
+ 2. 下载模型文件
66
+ 3. 转换为 ONNX 格式
67
+ - 这个过程可能需要 10-30 分钟
68
+
69
+ ### 步骤 4: 验证部署
70
+
71
+ 构建完成后:
72
+
73
+ 1. 访问健康检查端点:
74
+ ```
75
+ https://YOUR_USERNAME-YOUR_SPACE_NAME.hf.space/health
76
+ ```
77
+
78
+ 2. 测试 API:
79
+ ```bash
80
+ curl -X POST "https://YOUR_USERNAME-YOUR_SPACE_NAME.hf.space/v1/audio/speech" \
81
+ -H "Content-Type: application/json" \
82
+ -d '{"model": "liang", "input": "你好,这是测试。"}' \
83
+ --output test.wav
84
+ ```
85
+
86
+ ## ⚙️ 自定义配置
87
+
88
+ ### 添加新的语音模型
89
+
90
+ 1. 修改 `Dockerfile`,添加新模型的下载和转换:
91
+
92
+ ```dockerfile
93
+ # 下载新模型
94
+ RUN wget -O /app/temp/new_model.ckpt "YOUR_CKPT_URL" && \
95
+ wget -O /app/temp/new_model.pth "YOUR_PTH_URL" && \
96
+ wget -O /app/models/new_voice/reference/audio.wav "YOUR_REF_AUDIO_URL"
97
+
98
+ # 转换新模型
99
+ RUN python -c "
100
+ import genie_tts as genie
101
+ genie.convert_to_onnx(
102
+ torch_ckpt_path='/app/temp/new_model.ckpt',
103
+ torch_pth_path='/app/temp/new_model.pth',
104
+ output_dir='/app/models/new_voice/onnx'
105
+ )
106
+ "
107
+ ```
108
+
109
+ 2. 创建配置文件 `models/new_voice/config.json`:
110
+
111
+ ```json
112
+ {
113
+ "reference_audio": "reference/audio.wav",
114
+ "reference_text": "参考音频的文本内容",
115
+ "language": "Chinese"
116
+ }
117
+ ```
118
+
119
+ ### 修改模型下载链接
120
+
121
+ 编辑 `Dockerfile` 中的以下部分:
122
+
123
+ ```dockerfile
124
+ RUN wget -O /app/temp/model.ckpt "YOUR_NEW_CKPT_URL" && \
125
+ wget -O /app/temp/model.pth "YOUR_NEW_PTH_URL" && \
126
+ wget -O /app/models/liang/reference/audio.wav "YOUR_NEW_REF_AUDIO_URL"
127
+ ```
128
+
129
+ ### 支持的语言
130
+
131
+ 修改 `config.json` 中的 `language` 字段:
132
+
133
+ - `Chinese` - 中文
134
+ - `English` - 英语
135
+ - `Japanese` - 日语
136
+ - `Korean` - 韩语
137
+
138
+ ## 🐛 故障排除
139
+
140
+ ### 构建失败
141
+
142
+ 1. **检查日志**:在 Space 页面查看构建日志
143
+ 2. **模型下载失败**:确保下载链接可访问
144
+ 3. **内存不足**:升级到更高配置的硬件
145
+
146
+ ### API 响应慢
147
+
148
+ - 免费版 CPU 推理较慢,考虑升级硬件
149
+ - 首次请求会加载模型,后续请求会更快
150
+
151
+ ### 模型加载失败
152
+
153
+ 1. 检查 ONNX 转换是否成功
154
+ 2. 确保 `config.json` 配置正确
155
+ 3. 检查参考音频文件是否存在
156
+
157
+ ## 💡 优化建议
158
+
159
+ ### 减少镜像大小
160
+
161
+ 构建完成后删除 PyTorch:
162
+
163
+ ```dockerfile
164
+ RUN pip uninstall -y torch && rm -rf /root/.cache/pip
165
+ ```
166
+
167
+ ### 使用 GPU
168
+
169
+ 1. 在 Space 设置中选择 GPU 硬件
170
+ 2. 修改 `requirements.txt` 使用 GPU 版本的依赖
171
+
172
+ ### 私有部署
173
+
174
+ 如果需要私有部署,可以将 Space 设置为私有,并使用 Hugging Face Token 访问。
175
+
176
+ ## 📞 获取帮助
177
+
178
+ - [Genie-TTS GitHub Issues](https://github.com/High-Logic/Genie-TTS/issues)
179
+ - [Hugging Face Spaces 文档](https://huggingface.co/docs/hub/spaces)
Dockerfile ADDED
@@ -0,0 +1,102 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Genie-TTS OpenAI Compatible API - Docker Image
2
+ # ================================================
3
+ # This Dockerfile builds a container that:
4
+ # 1. Downloads PyTorch model files (.pth, .ckpt) from cloud URLs
5
+ # 2. Converts them to ONNX format
6
+ # 3. Runs the OpenAI-compatible TTS API server
7
+
8
+ FROM python:3.10-slim
9
+
10
+ # Set environment variables
11
+ ENV PYTHONUNBUFFERED=1 \
12
+ PYTHONDONTWRITEBYTECODE=1 \
13
+ PIP_NO_CACHE_DIR=1 \
14
+ PIP_DISABLE_PIP_VERSION_CHECK=1 \
15
+ DEBIAN_FRONTEND=noninteractive \
16
+ MODELS_DIR=/app/models \
17
+ GENIE_DATA_DIR=/app/genie_data
18
+
19
+ # Install system dependencies
20
+ RUN apt-get update && apt-get install -y --no-install-recommends \
21
+ build-essential \
22
+ libsndfile1 \
23
+ ffmpeg \
24
+ wget \
25
+ curl \
26
+ git \
27
+ && rm -rf /var/lib/apt/lists/*
28
+
29
+ # Create app directory
30
+ WORKDIR /app
31
+
32
+ # Copy requirements first for better caching
33
+ COPY requirements.txt .
34
+
35
+ # Install Python dependencies (including torch for model conversion)
36
+ RUN pip install --no-cache-dir -r requirements.txt
37
+
38
+ # Install PyTorch (CPU version for model conversion)
39
+ RUN pip install --no-cache-dir torch --index-url https://download.pytorch.org/whl/cpu
40
+
41
+ # Install genie-tts from source or pip
42
+ # Option 1: Install from pip
43
+ RUN pip install --no-cache-dir genie-tts
44
+
45
+ # Create directories
46
+ RUN mkdir -p /app/models/liang/onnx \
47
+ && mkdir -p /app/models/liang/reference \
48
+ && mkdir -p /app/genie_data \
49
+ && mkdir -p /app/temp
50
+
51
+ # Download model files
52
+ # Model: liang (Chinese V2ProPlus)
53
+ RUN echo "Downloading model files..." && \
54
+ wget -q --show-progress -O /app/temp/model.ckpt \
55
+ "https://22333misaka-openlist.hf.space/d/od/shantianliang_proplus_e32.ckpt" && \
56
+ wget -q --show-progress -O /app/temp/model.pth \
57
+ "https://22333misaka-openlist.hf.space/d/od/shantianliang_proplus_e8_s192.pth" && \
58
+ wget -q --show-progress -O /app/models/liang/reference/audio.wav \
59
+ "https://22333misaka-openlist.hf.space/d/od/ref_shantianliang_1.wav" && \
60
+ echo "Download complete!"
61
+
62
+ # Convert PyTorch models to ONNX
63
+ RUN echo "Converting models to ONNX format..." && \
64
+ python -c "
65
+ import genie_tts as genie
66
+ print('Starting ONNX conversion...')
67
+ genie.convert_to_onnx(
68
+ torch_ckpt_path='/app/temp/model.ckpt',
69
+ torch_pth_path='/app/temp/model.pth',
70
+ output_dir='/app/models/liang/onnx'
71
+ )
72
+ print('ONNX conversion complete!')
73
+ " && \
74
+ echo "Conversion complete!"
75
+
76
+ # Clean up temporary files and torch to reduce image size
77
+ RUN rm -rf /app/temp && \
78
+ pip uninstall -y torch && \
79
+ rm -rf /root/.cache/pip
80
+
81
+ # Create model configuration
82
+ RUN echo '{\n\
83
+ "reference_audio": "reference/audio.wav",\n\
84
+ "reference_text": "这是一条参考音频,将此音频拖入参考内,再添加文本,即可合成音色",\n\
85
+ "language": "Chinese"\n\
86
+ }' > /app/models/liang/config.json
87
+
88
+ # Copy application code
89
+ COPY app.py .
90
+
91
+ # Download Genie base data
92
+ RUN python -c "import genie_tts; genie_tts.download_genie_data()"
93
+
94
+ # Expose port (Hugging Face Spaces uses 7860)
95
+ EXPOSE 7860
96
+
97
+ # Health check
98
+ HEALTHCHECK --interval=30s --timeout=30s --start-period=60s --retries=3 \
99
+ CMD curl -f http://localhost:7860/health || exit 1
100
+
101
+ # Run the application
102
+ CMD ["python", "app.py"]
README.md CHANGED
@@ -1,10 +1,158 @@
1
  ---
2
- title: Ttsgenie
3
- emoji: 📚
4
- colorFrom: indigo
5
- colorTo: yellow
6
  sdk: docker
7
  pinned: false
 
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Genie-TTS OpenAI Compatible API
3
+ emoji: 🔮
4
+ colorFrom: purple
5
+ colorTo: blue
6
  sdk: docker
7
  pinned: false
8
+ license: mit
9
  ---
10
 
11
+ # 🔮 Genie-TTS OpenAI Compatible API
12
+
13
+ 基于 [Genie-TTS](https://github.com/High-Logic/Genie-TTS) 的 OpenAI 兼容 TTS API 服务。
14
+
15
+ ## 🚀 功能特点
16
+
17
+ - ✅ **OpenAI API 兼容** - 使用 `/v1/audio/speech` 端点,兼容 OpenAI SDK
18
+ - ✅ **高质量语音合成** - 基于 GPT-SoVITS V2ProPlus 模型
19
+ - ✅ **中文支持** - 目前支持中文语音合成
20
+ - ✅ **WAV 输出** - 32kHz 高质量音频输出
21
+
22
+ ## 📖 API 使用方法
23
+
24
+ ### 端点
25
+
26
+ ```
27
+ POST /v1/audio/speech
28
+ ```
29
+
30
+ ### 请求格式
31
+
32
+ ```json
33
+ {
34
+ "model": "liang",
35
+ "input": "你好,这是一段测试文本。"
36
+ }
37
+ ```
38
+
39
+ ### 请求参数
40
+
41
+ | 参数 | 类型 | 必需 | 说明 |
42
+ |------|------|------|------|
43
+ | `model` | string | ✅ | 语音模型名称 |
44
+ | `input` | string | ✅ | 要合成的文本 |
45
+ | `voice` | string | ❌ | 忽略 - 仅用于 OpenAI 兼容性 |
46
+ | `response_format` | string | ❌ | 忽略 - 只支持 wav |
47
+ | `speed` | number | ❌ | 忽略 - 仅用于 OpenAI 兼容性 |
48
+
49
+ ### 响应
50
+
51
+ - Content-Type: `audio/wav`
52
+ - 返回 WAV 格式的音频二进制数据
53
+
54
+ ## 💻 使用示例
55
+
56
+ ### 使用 curl
57
+
58
+ ```bash
59
+ curl -X POST "https://your-space.hf.space/v1/audio/speech" \
60
+ -H "Content-Type: application/json" \
61
+ -d '{"model": "liang", "input": "你好,欢迎使用语音合成服务。"}' \
62
+ --output speech.wav
63
+ ```
64
+
65
+ ### 使用 Python requests
66
+
67
+ ```python
68
+ import requests
69
+
70
+ response = requests.post(
71
+ "https://your-space.hf.space/v1/audio/speech",
72
+ json={
73
+ "model": "liang",
74
+ "input": "你好,这是一段测试文本。"
75
+ }
76
+ )
77
+
78
+ with open("speech.wav", "wb") as f:
79
+ f.write(response.content)
80
+ ```
81
+
82
+ ### 使用 OpenAI Python SDK
83
+
84
+ ```python
85
+ from openai import OpenAI
86
+
87
+ client = OpenAI(
88
+ api_key="not-needed", # API key 不需要
89
+ base_url="https://your-space.hf.space/v1"
90
+ )
91
+
92
+ response = client.audio.speech.create(
93
+ model="liang",
94
+ input="你好,这是一段测试文本。",
95
+ voice="alloy" # 会被忽略
96
+ )
97
+
98
+ response.stream_to_file("speech.wav")
99
+ ```
100
+
101
+ ## 🔧 其他端点
102
+
103
+ ### 健康检查
104
+
105
+ ```
106
+ GET /health
107
+ ```
108
+
109
+ 响应:
110
+ ```json
111
+ {
112
+ "status": "healthy",
113
+ "models_loaded": 1,
114
+ "available_models": ["liang"]
115
+ }
116
+ ```
117
+
118
+ ### 列出可用模型
119
+
120
+ ```
121
+ GET /v1/models
122
+ ```
123
+
124
+ 响应:
125
+ ```json
126
+ {
127
+ "object": "list",
128
+ "data": [
129
+ {
130
+ "id": "liang",
131
+ "object": "model",
132
+ "created": 1234567890,
133
+ "owned_by": "genie-tts"
134
+ }
135
+ ]
136
+ }
137
+ ```
138
+
139
+ ## 📝 可用模型
140
+
141
+ | 模型名称 | 语言 | 说明 |
142
+ |----------|------|------|
143
+ | `liang` | 中文 | GPT-SoVITS V2ProPlus 模型 |
144
+
145
+ ## ⚠️ 注意事项
146
+
147
+ 1. 首次加载可能需要一些时间
148
+ 2. 免费版 CPU 推理可能较慢
149
+ 3. 音频输出固定为 WAV 格式 (32kHz, 16-bit, 单声道)
150
+
151
+ ## 🔗 相关链接
152
+
153
+ - [Genie-TTS GitHub](https://github.com/High-Logic/Genie-TTS)
154
+ - [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)
155
+
156
+ ## 📄 许可证
157
+
158
+ MIT License
app.py ADDED
@@ -0,0 +1,376 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Genie-TTS OpenAI Compatible API Server
3
+ ======================================
4
+
5
+ This server provides an OpenAI-compatible TTS API endpoint (/v1/audio/speech)
6
+ for the Genie-TTS engine.
7
+
8
+ Usage:
9
+ POST /v1/audio/speech
10
+ {
11
+ "model": "liang", # Voice model name
12
+ "input": "要合成的文本", # Text to synthesize
13
+ "voice": "alloy", # Ignored - for OpenAI compatibility
14
+ "response_format": "wav", # Only wav is supported
15
+ "speed": 1.0 # Ignored - for OpenAI compatibility
16
+ }
17
+ """
18
+
19
+ import os
20
+ import sys
21
+ import io
22
+ import wave
23
+ import json
24
+ import logging
25
+ import asyncio
26
+ from pathlib import Path
27
+ from typing import Optional, Dict, Any, Union
28
+ from contextlib import asynccontextmanager
29
+
30
+ import numpy as np
31
+ from fastapi import FastAPI, HTTPException, Request
32
+ from fastapi.responses import Response, StreamingResponse, JSONResponse
33
+ from pydantic import BaseModel, Field
34
+
35
+ # Configure logging
36
+ logging.basicConfig(
37
+ level=logging.INFO,
38
+ format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
39
+ )
40
+ logger = logging.getLogger(__name__)
41
+
42
+ # Model configuration
43
+ MODELS_DIR = Path(os.environ.get("MODELS_DIR", "/app/models"))
44
+ VOICES: Dict[str, Dict[str, Any]] = {}
45
+
46
+ # Audio settings
47
+ SAMPLE_RATE = 32000
48
+ CHANNELS = 1
49
+ BYTES_PER_SAMPLE = 2
50
+
51
+
52
+ class SpeechRequest(BaseModel):
53
+ """OpenAI-compatible speech request model."""
54
+ model: str = Field(..., description="The voice model to use")
55
+ input: str = Field(..., description="The text to synthesize")
56
+ voice: Optional[str] = Field(default="alloy", description="Ignored - for OpenAI compatibility")
57
+ response_format: Optional[str] = Field(default="wav", description="Only wav is supported")
58
+ speed: Optional[float] = Field(default=1.0, description="Ignored - for OpenAI compatibility")
59
+
60
+
61
+ class ErrorResponse(BaseModel):
62
+ """OpenAI-compatible error response."""
63
+ error: Dict[str, Any]
64
+
65
+
66
+ def load_voice_config(voice_dir: Path) -> Optional[Dict[str, Any]]:
67
+ """Load voice configuration from a directory."""
68
+ config_path = voice_dir / "config.json"
69
+ if not config_path.exists():
70
+ logger.warning(f"Config file not found: {config_path}")
71
+ return None
72
+
73
+ try:
74
+ with open(config_path, "r", encoding="utf-8") as f:
75
+ config = json.load(f)
76
+
77
+ # Validate required fields
78
+ required_fields = ["reference_audio", "reference_text", "language"]
79
+ for field in required_fields:
80
+ if field not in config:
81
+ logger.error(f"Missing required field '{field}' in {config_path}")
82
+ return None
83
+
84
+ # Check if ONNX models exist
85
+ onnx_dir = voice_dir / "onnx"
86
+ if not onnx_dir.exists():
87
+ logger.error(f"ONNX model directory not found: {onnx_dir}")
88
+ return None
89
+
90
+ config["onnx_dir"] = str(onnx_dir)
91
+ config["voice_dir"] = str(voice_dir)
92
+
93
+ return config
94
+ except Exception as e:
95
+ logger.error(f"Failed to load config from {config_path}: {e}")
96
+ return None
97
+
98
+
99
+ def discover_voices() -> Dict[str, Dict[str, Any]]:
100
+ """Discover all available voice models."""
101
+ voices = {}
102
+
103
+ if not MODELS_DIR.exists():
104
+ logger.warning(f"Models directory not found: {MODELS_DIR}")
105
+ return voices
106
+
107
+ for voice_dir in MODELS_DIR.iterdir():
108
+ if voice_dir.is_dir():
109
+ voice_name = voice_dir.name
110
+ config = load_voice_config(voice_dir)
111
+ if config:
112
+ voices[voice_name] = config
113
+ logger.info(f"Loaded voice: {voice_name} (language: {config.get('language', 'unknown')})")
114
+
115
+ return voices
116
+
117
+
118
+ def initialize_genie():
119
+ """Initialize Genie-TTS engine and load all voice models."""
120
+ global VOICES
121
+
122
+ logger.info("Initializing Genie-TTS engine...")
123
+
124
+ # Import genie_tts
125
+ try:
126
+ import genie_tts as genie
127
+ except ImportError as e:
128
+ logger.error(f"Failed to import genie_tts: {e}")
129
+ raise
130
+
131
+ # Download Genie data if needed
132
+ logger.info("Checking Genie data...")
133
+ genie.download_genie_data()
134
+
135
+ # Discover and load voices
136
+ VOICES = discover_voices()
137
+
138
+ if not VOICES:
139
+ logger.warning("No voice models found!")
140
+ return
141
+
142
+ # Load each voice model
143
+ for voice_name, config in VOICES.items():
144
+ try:
145
+ logger.info(f"Loading voice model: {voice_name}")
146
+ genie.load_character(
147
+ character_name=voice_name,
148
+ onnx_model_dir=config["onnx_dir"],
149
+ language=config["language"]
150
+ )
151
+
152
+ # Set reference audio
153
+ ref_audio_path = os.path.join(config["voice_dir"], config["reference_audio"])
154
+ genie.set_reference_audio(
155
+ character_name=voice_name,
156
+ audio_path=ref_audio_path,
157
+ audio_text=config["reference_text"],
158
+ language=config["language"]
159
+ )
160
+
161
+ logger.info(f"Voice model loaded successfully: {voice_name}")
162
+ except Exception as e:
163
+ logger.error(f"Failed to load voice model {voice_name}: {e}")
164
+ del VOICES[voice_name]
165
+
166
+ logger.info(f"Genie-TTS initialized with {len(VOICES)} voice(s)")
167
+
168
+
169
+ @asynccontextmanager
170
+ async def lifespan(app: FastAPI):
171
+ """Application lifespan manager."""
172
+ # Startup
173
+ initialize_genie()
174
+ yield
175
+ # Shutdown
176
+ logger.info("Shutting down Genie-TTS server...")
177
+
178
+
179
+ # Create FastAPI app
180
+ app = FastAPI(
181
+ title="Genie-TTS OpenAI Compatible API",
182
+ description="OpenAI-compatible Text-to-Speech API powered by Genie-TTS",
183
+ version="1.0.0",
184
+ lifespan=lifespan
185
+ )
186
+
187
+
188
+ @app.get("/")
189
+ async def root():
190
+ """Root endpoint - health check."""
191
+ return {
192
+ "status": "healthy",
193
+ "service": "Genie-TTS OpenAI Compatible API",
194
+ "available_models": list(VOICES.keys())
195
+ }
196
+
197
+
198
+ @app.get("/health")
199
+ async def health():
200
+ """Health check endpoint."""
201
+ return {
202
+ "status": "healthy",
203
+ "models_loaded": len(VOICES),
204
+ "available_models": list(VOICES.keys())
205
+ }
206
+
207
+
208
+ @app.get("/v1/models")
209
+ async def list_models():
210
+ """List available models (OpenAI-compatible)."""
211
+ import time
212
+
213
+ models = []
214
+ for voice_name in VOICES.keys():
215
+ models.append({
216
+ "id": voice_name,
217
+ "object": "model",
218
+ "created": int(time.time()),
219
+ "owned_by": "genie-tts"
220
+ })
221
+
222
+ return {
223
+ "object": "list",
224
+ "data": models
225
+ }
226
+
227
+
228
+ def generate_wav_header(data_size: int) -> bytes:
229
+ """Generate WAV file header."""
230
+ header = io.BytesIO()
231
+
232
+ # RIFF header
233
+ header.write(b'RIFF')
234
+ header.write((data_size + 36).to_bytes(4, 'little')) # File size - 8
235
+ header.write(b'WAVE')
236
+
237
+ # fmt chunk
238
+ header.write(b'fmt ')
239
+ header.write((16).to_bytes(4, 'little')) # Chunk size
240
+ header.write((1).to_bytes(2, 'little')) # Audio format (PCM)
241
+ header.write((CHANNELS).to_bytes(2, 'little')) # Number of channels
242
+ header.write((SAMPLE_RATE).to_bytes(4, 'little')) # Sample rate
243
+ header.write((SAMPLE_RATE * CHANNELS * BYTES_PER_SAMPLE).to_bytes(4, 'little')) # Byte rate
244
+ header.write((CHANNELS * BYTES_PER_SAMPLE).to_bytes(2, 'little')) # Block align
245
+ header.write((BYTES_PER_SAMPLE * 8).to_bytes(2, 'little')) # Bits per sample
246
+
247
+ # data chunk
248
+ header.write(b'data')
249
+ header.write(data_size.to_bytes(4, 'little'))
250
+
251
+ return header.getvalue()
252
+
253
+
254
+ @app.post("/v1/audio/speech")
255
+ async def create_speech(request: SpeechRequest):
256
+ """
257
+ Generate speech from text (OpenAI-compatible endpoint).
258
+
259
+ This endpoint is compatible with the OpenAI TTS API format.
260
+ Only the 'model' and 'input' parameters are used.
261
+ """
262
+ import genie_tts as genie
263
+
264
+ # Validate model
265
+ if request.model not in VOICES:
266
+ return JSONResponse(
267
+ status_code=404,
268
+ content={
269
+ "error": {
270
+ "message": f"Model '{request.model}' not found. Available models: {list(VOICES.keys())}",
271
+ "type": "invalid_request_error",
272
+ "code": "model_not_found"
273
+ }
274
+ }
275
+ )
276
+
277
+ # Validate input
278
+ if not request.input or not request.input.strip():
279
+ return JSONResponse(
280
+ status_code=400,
281
+ content={
282
+ "error": {
283
+ "message": "Input text cannot be empty",
284
+ "type": "invalid_request_error",
285
+ "code": "invalid_input"
286
+ }
287
+ }
288
+ )
289
+
290
+ try:
291
+ # Collect audio chunks
292
+ audio_chunks = []
293
+
294
+ async for chunk in genie.tts_async(
295
+ character_name=request.model,
296
+ text=request.input.strip(),
297
+ play=False,
298
+ split_sentence=True
299
+ ):
300
+ audio_chunks.append(chunk)
301
+
302
+ if not audio_chunks:
303
+ return JSONResponse(
304
+ status_code=500,
305
+ content={
306
+ "error": {
307
+ "message": "Failed to generate audio",
308
+ "type": "server_error",
309
+ "code": "generation_failed"
310
+ }
311
+ }
312
+ )
313
+
314
+ # Combine all chunks
315
+ audio_data = b''.join(audio_chunks)
316
+
317
+ # Generate complete WAV file
318
+ wav_header = generate_wav_header(len(audio_data))
319
+ wav_content = wav_header + audio_data
320
+
321
+ return Response(
322
+ content=wav_content,
323
+ media_type="audio/wav",
324
+ headers={
325
+ "Content-Disposition": "attachment; filename=speech.wav"
326
+ }
327
+ )
328
+
329
+ except Exception as e:
330
+ logger.error(f"TTS generation failed: {e}", exc_info=True)
331
+ return JSONResponse(
332
+ status_code=500,
333
+ content={
334
+ "error": {
335
+ "message": f"TTS generation failed: {str(e)}",
336
+ "type": "server_error",
337
+ "code": "generation_failed"
338
+ }
339
+ }
340
+ )
341
+
342
+
343
+ # Error handlers
344
+ @app.exception_handler(404)
345
+ async def not_found_handler(request: Request, exc: HTTPException):
346
+ return JSONResponse(
347
+ status_code=404,
348
+ content={
349
+ "error": {
350
+ "message": "Not found",
351
+ "type": "invalid_request_error",
352
+ "code": "not_found"
353
+ }
354
+ }
355
+ )
356
+
357
+
358
+ @app.exception_handler(500)
359
+ async def internal_error_handler(request: Request, exc: Exception):
360
+ return JSONResponse(
361
+ status_code=500,
362
+ content={
363
+ "error": {
364
+ "message": "Internal server error",
365
+ "type": "server_error",
366
+ "code": "internal_error"
367
+ }
368
+ }
369
+ )
370
+
371
+
372
+ if __name__ == "__main__":
373
+ import uvicorn
374
+
375
+ port = int(os.environ.get("PORT", 7860))
376
+ uvicorn.run(app, host="0.0.0.0", port=port)
requirements.txt ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Genie-TTS OpenAI Compatible API - Dependencies
2
+ # ================================================
3
+
4
+ # Web framework
5
+ fastapi>=0.100.0
6
+ uvicorn[standard]>=0.23.0
7
+
8
+ # Genie-TTS core dependencies
9
+ genie-tts>=2.0.0
10
+
11
+ # Audio processing
12
+ soundfile>=0.12.0
13
+ numpy>=1.24.0
14
+
15
+ # Additional utilities
16
+ pydantic>=2.0.0
17
+ python-multipart>=0.0.6
18
+
19
+ # HTTP client for health checks
20
+ httpx>=0.24.0