Instructions to use MoYoYoTech/VoiceDialogue with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MoYoYoTech/VoiceDialogue with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-to-speech", model="MoYoYoTech/VoiceDialogue")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("MoYoYoTech/VoiceDialogue", dtype="auto")

llama-cpp-python

How to use MoYoYoTech/VoiceDialogue with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="MoYoYoTech/VoiceDialogue",
	filename="assets/models/llm/qwen/Qwen3-8B-Q6_K.gguf",
)

llm.create_chat_completion(
	messages = "\"The answer to the universe is 42\""
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use MoYoYoTech/VoiceDialogue with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
./llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
./build/bin/llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Use Docker

docker model run hf.co/MoYoYoTech/VoiceDialogue:Q6_K

LM Studio
Jan
Ollama
How to use MoYoYoTech/VoiceDialogue with Ollama:
```
ollama run hf.co/MoYoYoTech/VoiceDialogue:Q6_K
```

Unsloth Studio

How to use MoYoYoTech/VoiceDialogue with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

How to use MoYoYoTech/VoiceDialogue with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "MoYoYoTech/VoiceDialogue:Q6_K"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use MoYoYoTech/VoiceDialogue with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default MoYoYoTech/VoiceDialogue:Q6_K

Run Hermes

hermes

Docker Model Runner
How to use MoYoYoTech/VoiceDialogue with Docker Model Runner:
```
docker model run hf.co/MoYoYoTech/VoiceDialogue:Q6_K
```

Lemonade

How to use MoYoYoTech/VoiceDialogue with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull MoYoYoTech/VoiceDialogue:Q6_K

Run and chat with the model

lemonade run user.VoiceDialogue-Q6_K

List all available models

lemonade list

liumaolin commited on Jun 1, 2025

Commit

ef0d09e

1 Parent(s): 025ca3f

Refactor TTS architecture: implement runtime interface, TTS manager, universal registry, and factory pattern to support multiple engines.

Browse files

Files changed (14) hide show

src/VoiceDialogue/main.py +2 -2
src/VoiceDialogue/models/voice_model/__init__.py +0 -19
src/VoiceDialogue/services/audio/audio_answer.py +6 -34
src/VoiceDialogue/services/audio/audio_generator/__init__.py +51 -0
src/VoiceDialogue/services/audio/audio_generator/configs/__init__.py +52 -0
src/VoiceDialogue/{models/voice_model/moyoyo_configs.py → services/audio/audio_generator/configs/moyoyo.py} +1 -1
src/VoiceDialogue/services/audio/audio_generator/models/__init__.py +72 -0
src/VoiceDialogue/{models/voice_model → services/audio/audio_generator/models}/base.py +0 -37
src/VoiceDialogue/{models/voice_model/moyoyo_tts.py → services/audio/audio_generator/models/moyoyo.py} +1 -18
src/VoiceDialogue/services/audio/audio_generator/runtime/__init__.py +32 -0
src/VoiceDialogue/services/audio/audio_generator/runtime/interface.py +103 -0
src/VoiceDialogue/services/audio/audio_generator/runtime/moyoyo.py +50 -0
src/VoiceDialogue/services/audio/audio_generator/tts_manager.py +168 -0
src/VoiceDialogue/services/audio/audio_player.py +4 -5

src/VoiceDialogue/main.py CHANGED Viewed

@@ -7,7 +7,7 @@ from config.paths import load_third_party
 load_third_party()
-from models.voice_model import tts_config_registry, TTSConfigType
 from services.audio.aec_audio_capture import EchoCancellingAudioCapture
 from services.audio.audio_answer import TTSAudioGenerator
 from services.audio.audio_player import AudioStreamPlayer
@@ -73,7 +73,7 @@ def launch_system(
     audio_generator_worker = TTSAudioGenerator(
         processed_answer_queue=generated_answer_queue,
         tts_generated_audio_queue=tts_generated_audio_queue,
-        voice_role=tts_speaker_config
     )
     audio_generator_worker.start()
     threads.append(audio_generator_worker)

 load_third_party()
+from services.audio.audio_generator.models import tts_config_registry, TTSConfigType
 from services.audio.aec_audio_capture import EchoCancellingAudioCapture
 from services.audio.audio_answer import TTSAudioGenerator
 from services.audio.audio_player import AudioStreamPlayer
     audio_generator_worker = TTSAudioGenerator(
         processed_answer_queue=generated_answer_queue,
         tts_generated_audio_queue=tts_generated_audio_queue,
+        tts_config=tts_speaker_config
     )
     audio_generator_worker.start()
     threads.append(audio_generator_worker)

src/VoiceDialogue/models/voice_model/__init__.py DELETED Viewed

@@ -1,19 +0,0 @@
-from .base import TTSConfigType, VoiceModelStatus, tts_config_registry
-from .moyoyo_configs import get_moyoyo_configs
-from .moyoyo_tts import MoYoYoTTSConfig, MoYoYoTTSInference
-# 注册MoYoYo TTS
-moyoyo_inference = MoYoYoTTSInference()
-tts_config_registry.register_inference_engine(TTSConfigType.MOYOYO, moyoyo_inference)
-# 注册所有MoYoYo配置
-for config in get_moyoyo_configs():
-    tts_config_registry.register_config(config)
-__all__ = [
-    'TTSConfigType',
-    'VoiceModelStatus',
-    'tts_config_registry',
-    'MoYoYoTTSConfig',
-    'MoYoYoTTSInference',
-]

src/VoiceDialogue/services/audio/audio_answer.py CHANGED Viewed

@@ -1,56 +1,28 @@
 import time
-import typing
 from multiprocessing import Queue
 from queue import Empty
-from config.paths import load_third_party
-load_third_party()
-from moyoyo_tts import TTSModule, TTS_Config
 from models.voice_task import VoiceTask
 from services.core.base import BaseThread
 from services.core.constants import dropped_audio_cache, user_still_speaking_event, voice_state_manager
-from models.voice_model import MoYoYoTTSConfig
 class TTSAudioGenerator(BaseThread):
     """TTS 音频生成器 - 负责将文本转换为音频"""
     def __init__(self, group=None, target=None, name=None, args=(), kwargs={}, *, daemon=None,
-                 processed_answer_queue, tts_generated_audio_queue, voice_role: MoYoYoTTSConfig):
         super().__init__(group, target, name, args, kwargs, daemon=daemon)
         self.processed_answer_queue: Queue = processed_answer_queue
         self.tts_generated_audio_queue: Queue = tts_generated_audio_queue
-        self._device = "cpu"  # mps slower 11.66(cpu) vs 39.42(mps)
-        self._tts_config = voice_role
-        self.tts_module: typing.Optional[TTSModule] = None
-    def setup_tts_config(self, voice_role: MoYoYoTTSConfig):
-        tts_config = TTS_Config(voice_role.get_runtime_config())
-        return tts_config
-    def warmup(self, warmup_steps=1):
-        print('[INFO:] Warming up TTS engine...')
-        warmup_texts = ['Warming up TTS engine.', '预热文字转音频引擎。']
-        for _ in range(warmup_steps):
-            for warmup_text in warmup_texts:
-                self.tts_module.generate_audio(warmup_text)
-        print('[INFO:] Warm up TTS engine finished.')
     def run(self):
-        tts_config = self.setup_tts_config(self._tts_config)
-        self.tts_module = TTSModule(tts_config)
-        self.tts_module.setup_inference_params(
-            ref_audio=self._tts_config.reference_audio_path,
-            parallel_infer=False,
-            **self._tts_config.inference_parameters.model_dump()
-        )
-        self.warmup()
         self.is_ready = True
@@ -80,7 +52,7 @@ class TTSAudioGenerator(BaseThread):
                 continue
             voice_task.tts_start_time = time.time()
-            tts_generated_sentence_audio = self.tts_module.generate_audio(voice_task.answer_sentence)
             voice_task.tts_generated_sentence_audio = tts_generated_sentence_audio
             voice_task.tts_end_time = time.time()
             # print(f'生成音频：{voice_task.answer_sentence}')

 import time
 from multiprocessing import Queue
 from queue import Empty
 from models.voice_task import VoiceTask
 from services.core.base import BaseThread
 from services.core.constants import dropped_audio_cache, user_still_speaking_event, voice_state_manager
+from .audio_generator import tts_manager, BaseTTSConfig
 class TTSAudioGenerator(BaseThread):
     """TTS 音频生成器 - 负责将文本转换为音频"""
     def __init__(self, group=None, target=None, name=None, args=(), kwargs={}, *, daemon=None,
+                 processed_answer_queue, tts_generated_audio_queue, tts_config: BaseTTSConfig):
         super().__init__(group, target, name, args, kwargs, daemon=daemon)
         self.processed_answer_queue: Queue = processed_answer_queue
         self.tts_generated_audio_queue: Queue = tts_generated_audio_queue
+        self.tts_instance = tts_manager.create_tts(tts_config)
     def run(self):
+        self.tts_instance.setup()
+        self.tts_instance.warmup()
         self.is_ready = True
                 continue
             voice_task.tts_start_time = time.time()
+            tts_generated_sentence_audio = self.tts_instance.synthesize(voice_task.answer_sentence)
             voice_task.tts_generated_sentence_audio = tts_generated_sentence_audio
             voice_task.tts_end_time = time.time()
             # print(f'生成音频：{voice_task.answer_sentence}')

src/VoiceDialogue/services/audio/audio_generator/__init__.py ADDED Viewed

	@@ -0,0 +1,51 @@

+"""
+Audio Generator Module
+提供文本转语音(TTS)功能的完整解决方案，包括：
+- TTS管理器和注册系统
+- 多种TTS引擎支持
+- 配置管理
+- 运行时接口
+"""
+from .models import (
+    TTSConfigType,
+    VoiceModelStatus,
+    tts_config_registry,
+    BaseTTSConfig
+)
+from .runtime import (
+    TTSInterface,
+    TTSFactory
+)
+from .tts_manager import (
+    TTSManager,
+    TTSRegistryTables,
+    tts_manager,
+    tts_tables,
+    register_all_tts
+)
+__version__ = "1.0.0"
+__all__ = [
+    # 管理器和注册表
+    'TTSManager',
+    'TTSRegistryTables',
+    'tts_manager',
+    'tts_tables',
+    'register_all_tts',
+    # 配置模型
+    'TTSConfigType',
+    'VoiceModelStatus',
+    'tts_config_registry',
+    'BaseTTSConfig',
+    # 运行时接口
+    'TTSInterface',
+    'TTSFactory',
+]
+# 模块初始化时自动注册所有TTS实现
+# register_all_tts() 已在 tts_manager 模块中自动调用

src/VoiceDialogue/services/audio/audio_generator/configs/__init__.py ADDED Viewed

	@@ -0,0 +1,52 @@

+"""
+Configs Module
+TTS配置模块，包含：
+- 各种TTS引擎的预配置
+- 配置加载函数
+"""
+# 导入配置加载函数
+try:
+    from .moyoyo import get_moyoyo_configs
+    __all__ = [
+        'get_moyoyo_configs',
+    ]
+    # 配置获取函数映射
+    CONFIG_GETTERS = {
+        'moyoyo': get_moyoyo_configs,
+    }
+except ImportError as e:
+    import logging
+    logging.warning(f"Failed to import some config modules: {e}")
+    __all__ = []
+    CONFIG_GETTERS = {}
+def get_all_configs():
+    """获取所有可用的TTS配置"""
+    all_configs = []
+    for getter_func in CONFIG_GETTERS.values():
+        try:
+            configs = getter_func()
+            all_configs.extend(configs)
+        except Exception as e:
+            import logging
+            logging.error(f"Failed to load configs from {getter_func.__name__}: {e}")
+    return all_configs
+def get_configs_by_type(tts_type: str):
+    """根据TTS类型获取配置"""
+    if tts_type in CONFIG_GETTERS:
+        try:
+            return CONFIG_GETTERS[tts_type]()
+        except Exception as e:
+            import logging
+            logging.error(f"Failed to load configs for {tts_type}: {e}")
+            return []
+    return []

src/VoiceDialogue/{models/voice_model/moyoyo_configs.py → services/audio/audio_generator/configs/moyoyo.py} RENAMED Viewed

@@ -1,4 +1,4 @@
-from .moyoyo_tts import MoYoYoTTSConfig
 # 基础预训练模型文件映射
 BASE_PRETRAINED_FILES = {

+from services.audio.audio_generator.models.moyoyo import MoYoYoTTSConfig
 # 基础预训练模型文件映射
 BASE_PRETRAINED_FILES = {

src/VoiceDialogue/services/audio/audio_generator/models/__init__.py ADDED Viewed

	@@ -0,0 +1,72 @@

+"""
+Models Module
+TTS模型定义模块，包含：
+- 基础配置抽象类
+- 各种TTS引擎的配置模型
+- 全局配置注册表
+"""
+from .base import (
+    TTSConfigType,
+    VoiceModelStatus,
+    BaseTTSConfig,
+    TTSConfigRegistry,
+    tts_config_registry
+)
+# 导入具体的配置模型
+try:
+    from .moyoyo import MoYoYoTTSConfig
+    _moyoyo_available = True
+except ImportError:
+    _moyoyo_available = False
+    import logging
+    logging.warning("MoYoYo TTS config not available")
+# 动态构建导出列表
+__all__ = [
+    'TTSConfigType',
+    'VoiceModelStatus',
+    'BaseTTSConfig',
+    'TTSConfigRegistry',
+    'tts_config_registry',
+]
+if _moyoyo_available:
+    __all__.append('MoYoYoTTSConfig')
+# 自动注册所有可用的配置
+def _auto_register_configs():
+    """自动注册所有TTS配置"""
+    try:
+        if _moyoyo_available:
+            from ..configs.moyoyo import get_moyoyo_configs
+            for config in get_moyoyo_configs():
+                tts_config_registry.register_config(config)
+    except Exception as e:
+        import logging
+        logging.error(f"Failed to auto-register configs: {e}")
+# 模块加载时自动注册配置
+_auto_register_configs()
+# 配置统计信息
+def get_config_stats():
+    """获取配置统计信息"""
+    all_configs = tts_config_registry.get_all_configs()
+    stats = {
+        'total_configs': len(all_configs),
+        'configs_by_type': {}
+    }
+    for config_type in TTSConfigType:
+        type_configs = tts_config_registry.get_configs_by_type(config_type)
+        stats['configs_by_type'][config_type.value] = len(type_configs)
+    return stats

src/VoiceDialogue/{models/voice_model → services/audio/audio_generator/models}/base.py RENAMED Viewed

@@ -52,36 +52,17 @@ class BaseTTSConfig(BaseModel, ABC):
         pass
-class BaseTTSInference(ABC):
-    """TTS推理基类"""
-    @abstractmethod
-    def generate_speech(self, text: str, config: BaseTTSConfig, **kwargs) -> bytes:
-        """生成语音"""
-        pass
-    @abstractmethod
-    def is_supported_config(self, config: BaseTTSConfig) -> bool:
-        """检查是否支持此配置"""
-        pass
 class TTSConfigRegistry:
     """TTS注册表，管理所有TTS引擎和配置"""
     def __init__(self):
         self._configs: dict[str, BaseTTSConfig] = {}
-        self._inference_engines: dict[TTSConfigType, BaseTTSInference] = {}
     def register_config(self, config: BaseTTSConfig):
         """注册TTS配置"""
         key = f"{config.tts_type.value}:{config.character_name}"
         self._configs[key] = config
-    def register_inference_engine(self, tts_type: TTSConfigType, engine: BaseTTSInference):
-        """注册TTS推理引擎"""
-        self._inference_engines[tts_type] = engine
     def get_config(self, tts_type: TTSConfigType, character_name: str) -> BaseTTSConfig:
         """获取指定配置"""
         key = f"{tts_type.value}:{character_name}"
@@ -96,24 +77,6 @@ class TTSConfigRegistry:
         """获取所有配置"""
         return list(self._configs.values())
-    def get_inference_engine(self, tts_type: TTSConfigType) -> BaseTTSInference:
-        """获取推理引擎"""
-        return self._inference_engines.get(tts_type)
-    def generate_speech(self, tts_type: TTSConfigType, character_name: str,
-                        text: str, **kwargs) -> bytes:
-        """生成语音的统一接口"""
-        config = self.get_config(tts_type, character_name)
-        engine = self.get_inference_engine(tts_type)
-        if not config or not engine:
-            raise ValueError(f"TTS配置或引擎不存在: {tts_type.value}:{character_name}")
-        if not engine.is_supported_config(config):
-            raise ValueError(f"推理引擎不支持此配置: {tts_type.value}:{character_name}")
-        return engine.generate_speech(text, config, **kwargs)
 # 全局TTS注册表实例
 tts_config_registry = TTSConfigRegistry()

         pass
 class TTSConfigRegistry:
     """TTS注册表，管理所有TTS引擎和配置"""
     def __init__(self):
         self._configs: dict[str, BaseTTSConfig] = {}
     def register_config(self, config: BaseTTSConfig):
         """注册TTS配置"""
         key = f"{config.tts_type.value}:{config.character_name}"
         self._configs[key] = config
     def get_config(self, tts_type: TTSConfigType, character_name: str) -> BaseTTSConfig:
         """获取指定配置"""
         key = f"{tts_type.value}:{character_name}"
         """获取所有配置"""
         return list(self._configs.values())
 # 全局TTS注册表实例
 tts_config_registry = TTSConfigRegistry()

src/VoiceDialogue/{models/voice_model/moyoyo_tts.py → services/audio/audio_generator/models/moyoyo.py} RENAMED Viewed

@@ -6,7 +6,7 @@ from pydantic import BaseModel, Field
 from config.settings import settings
 from utils.download_utils import download_file_from_huggingface
-from .base import BaseTTSConfig, BaseTTSInference, TTSConfigType, VoiceModelStatus
 class InferenceParameters(BaseModel):
@@ -140,20 +140,3 @@ class MoYoYoTTSConfig(BaseTTSConfig):
                 'bert_base_path': self.bert_model_path,
             }
         }
-class MoYoYoTTSInference(BaseTTSInference):
-    """MoYoYo TTS推理引擎"""
-    def generate_speech(self, text: str, config: BaseTTSConfig, **kwargs) -> bytes:
-        """生成语音"""
-        if not isinstance(config, MoYoYoTTSConfig):
-            raise ValueError("配置类型不匹配，需要MoYoYoTTSConfig")
-        # 这里实现MoYoYo TTS的具体推理逻辑
-        # 暂时返回空字节，实际实现需要调用相应的TTS模型
-        return b""
-    def is_supported_config(self, config: BaseTTSConfig) -> bool:
-        """检查是否支持此配置"""
-        return isinstance(config, MoYoYoTTSConfig)

 from config.settings import settings
 from utils.download_utils import download_file_from_huggingface
+from .base import BaseTTSConfig, TTSConfigType, VoiceModelStatus
 class InferenceParameters(BaseModel):
                 'bert_base_path': self.bert_model_path,
             }
         }

src/VoiceDialogue/services/audio/audio_generator/runtime/__init__.py ADDED Viewed

	@@ -0,0 +1,32 @@

+"""
+Runtime Module
+TTS运行时模块，包含：
+- TTS抽象接口定义
+- TTS工厂类
+- 具体TTS实现
+"""
+from .interface import TTSInterface, TTSFactory
+# 导入所有TTS实现，确保注册装饰器被执行
+try:
+    from .moyoyo import MoYoYoTTS
+    __all__ = [
+        'TTSInterface',
+        'TTSFactory',
+        'MoYoYoTTS'
+    ]
+except ImportError as e:
+    # 如果某些TTS实现无法导入，不影响整体功能
+    import logging
+    logging.warning(f"Failed to import some TTS implementations: {e}")
+    __all__ = [
+        'TTSInterface',
+        'TTSFactory'
+    ]
+# 可用的TTS实现列表
+AVAILABLE_TTS_IMPLEMENTATIONS = [impl for impl in __all__ if impl.endswith('TTS')]

src/VoiceDialogue/services/audio/audio_generator/runtime/interface.py ADDED Viewed

	@@ -0,0 +1,103 @@

+from abc import ABC, abstractmethod
+from typing import Tuple
+import numpy as np
+from ..models.base import BaseTTSConfig
+class TTSInterface(ABC):
+    """TTS服务的抽象接口"""
+    def __init__(self, config: BaseTTSConfig):
+        self.config = config
+        self._is_ready = False
+    @abstractmethod
+    def setup(self, **kwargs) -> None:
+        """
+        初始化TTS服务
+        Args:
+            **kwargs: 额外的初始化参数
+        """
+        pass
+    @abstractmethod
+    def warmup(self, warmup_steps: int = 1) -> None:
+        """
+        预热TTS引擎
+        Args:
+            warmup_steps: 预热步数
+        """
+        pass
+    @abstractmethod
+    def synthesize(self, text: str, **kwargs) -> Tuple[int, np.ndarray]:
+        """
+        将文本转换为语音
+        Args:
+            text: 要转换的文本
+            **kwargs: 额外的合成参数
+        Returns:
+            Tuple[np.ndarray, int]: (音频数据, 采样率)
+        """
+        pass
+    @property
+    def is_ready(self) -> bool:
+        """
+        检查TTS服务是否准备就绪
+        Returns:
+            bool: 是否准备就绪
+        """
+        return self._is_ready
+    @is_ready.setter
+    def is_ready(self, value: bool):
+        self._is_ready = value
+    def get_config(self) -> BaseTTSConfig:
+        """获取当前配置"""
+        return self.config
+class TTSFactory:
+    """TTS工厂类，用于创建不同的TTS实现"""
+    _registry = {}
+    @classmethod
+    def register(cls, provider_name: str, tts_class):
+        """注册TTS提供者"""
+        cls._registry[provider_name] = tts_class
+    @classmethod
+    def create(cls, config: BaseTTSConfig) -> TTSInterface:
+        """
+        根据配置创建TTS实例
+        Args:
+            config: TTS配置
+        Returns:
+            TTSInterface: TTS实例
+        Raises:
+            ValueError: 不支持的TTS提供者
+        """
+        provider = config.provider.value
+        if provider not in cls._registry:
+            raise ValueError(f"不支持的TTS提供者: {provider}")
+        tts_class = cls._registry[provider]
+        return tts_class(config)
+    @classmethod
+    def list_providers(cls):
+        """列出所有已注册的TTS提供者"""
+        return list(cls._registry.keys())

src/VoiceDialogue/services/audio/audio_generator/runtime/moyoyo.py ADDED Viewed

	@@ -0,0 +1,50 @@

+import typing
+from typing import Tuple
+import numpy as np
+from config.paths import load_third_party
+from .interface import TTSInterface
+from ..models.moyoyo import MoYoYoTTSConfig
+from ..tts_manager import tts_tables
+load_third_party()
+from moyoyo_tts import TTSModule, TTS_Config
+@tts_tables.register("tts_classes", "moyoyo")
+class MoYoYoTTS(TTSInterface):
+    """MoYoYo TTS实现"""
+    def __init__(self, config: MoYoYoTTSConfig):
+        super().__init__(config)
+        self.tts_module: typing.Optional[TTSModule] = None
+    def setup(self, **kwargs) -> None:
+        """设置TTS模块"""
+        tts_config = TTS_Config(self.config.get_runtime_config())
+        self.tts_module = TTSModule(tts_config)
+        self.tts_module.setup_inference_params(
+            ref_audio=self.config.reference_audio_path,
+            parallel_infer=False,
+            **self.config.inference_parameters.model_dump()
+        )
+        self.is_ready = True
+    def warmup(self, warmup_steps: int = 1) -> None:
+        """预热TTS引擎"""
+        print('[INFO:] Warming up TTS engine...')
+        warmup_texts = ['Warming up TTS engine.', '预热文字转音频引擎。']
+        for _ in range(warmup_steps):
+            for warmup_text in warmup_texts:
+                self.tts_module.generate_audio(warmup_text, warmup=True)
+        print('[INFO:] Warm up TTS engine finished.')
+    def synthesize(self, text: str, **kwargs) -> Tuple[np.ndarray, int]:
+        """合成语音"""
+        if not self.is_ready:
+            raise RuntimeError("TTS module is not ready. Please call setup() first.")
+        (sample_rate, audio_data), *_ = self.tts_module.generate_audio(text)
+        return audio_data, sample_rate

src/VoiceDialogue/services/audio/audio_generator/tts_manager.py ADDED Viewed

	@@ -0,0 +1,168 @@

+import logging
+import inspect
+from dataclasses import dataclass
+import re
+from typing import Dict, Type, Optional
+from .runtime.interface import TTSInterface
+from .models.base import BaseTTSConfig, TTSConfigType
+@dataclass
+class TTSRegistryTables:
+    """TTS注册表系统，用于管理不同的TTS实现"""
+    tts_classes: Dict[str, Type[TTSInterface]] = None
+    def __post_init__(self):
+        if self.tts_classes is None:
+            self.tts_classes = {}
+    def print(self, key: str = None) -> None:
+        """打印已注册的TTS类"""
+        print("\nTTS Registry Tables: \n")
+        headers = ["register name", "class name", "class location"]
+        if self.tts_classes and (key is None or "tts_classes" in key):
+            print(f"-----------    ** tts_classes **    --------------")
+            metas = []
+            for register_key, tts_class in self.tts_classes.items():
+                class_file = inspect.getfile(tts_class)
+                class_line = inspect.getsourcelines(tts_class)[1]
+                # 简化路径显示
+                pattern = r"^.+/VoiceDialogue/"
+                class_file = re.sub(pattern, "VoiceDialogue/", class_file)
+                meta_data = [
+                    register_key,
+                    tts_class.__name__,
+                    f"{class_file}:{class_line}",
+                ]
+                metas.append(meta_data)
+            metas.sort(key=lambda x: x[0])
+            data = [headers] + metas
+            col_widths = [max(len(str(item)) for item in col) for col in zip(*data)]
+            for row in data:
+                print(
+                    "| "
+                    + " | ".join(str(item).ljust(width) for item, width in zip(row, col_widths))
+                    + " |"
+                )
+        print("\n")
+    def register(self, register_table_key: str, key: str = None) -> callable:
+        """装饰器，用于注册TTS类"""
+        def decorator(target_class):
+            if not hasattr(self, register_table_key):
+                setattr(self, register_table_key, {})
+                logging.debug(f"New TTS registry table added: {register_table_key}")
+            registry = getattr(self, register_table_key)
+            registry_key = key if key is not None else target_class.__name__
+            if registry_key in registry:
+                logging.debug(
+                    f"Key {registry_key} already exists in {register_table_key}, re-register"
+                )
+            registry[registry_key] = target_class
+            logging.info(f"Registered TTS class: {registry_key} -> {target_class.__name__}")
+            return target_class
+        return decorator
+# 全局TTS注册表实例
+tts_tables = TTSRegistryTables()
+class TTSManager:
+    """TTS管理器，负责管理和创建TTS实例"""
+    def __init__(self):
+        self._tts_instances: Dict[str, TTSInterface] = {}
+    def create_tts(self, config: BaseTTSConfig) -> TTSInterface:
+        """
+        根据配置创建TTS实例
+        Args:
+            config: TTS配置对象
+        Returns:
+            TTSInterface: TTS实例
+        Raises:
+            ValueError: 如果TTS类型未注册
+        """
+        tts_type = config.tts_type.value
+        if tts_type not in tts_tables.tts_classes:
+            raise ValueError(f"未注册的TTS类型: {tts_type}. 可用类型: {list(tts_tables.tts_classes.keys())}")
+        tts_class = tts_tables.tts_classes[tts_type]
+        return tts_class(config)
+    def get_or_create_tts(self, config: BaseTTSConfig) -> TTSInterface:
+        """
+        获取或创建TTS实例（单例模式）
+        Args:
+            config: TTS配置对象
+        Returns:
+            TTSInterface: TTS实例
+        """
+        instance_key = f"{config.tts_type.value}:{config.character_name}"
+        if instance_key not in self._tts_instances:
+            self._tts_instances[instance_key] = self.create_tts(config)
+        return self._tts_instances[instance_key]
+    def list_registered_tts(self) -> Dict[str, Type[TTSInterface]]:
+        """列出所有已注册的TTS类"""
+        return tts_tables.tts_classes.copy()
+    def is_tts_registered(self, tts_type: str) -> bool:
+        """检查指定TTS类型是否已注册"""
+        return tts_type in tts_tables.tts_classes
+    def print_registry(self):
+        """打印注册表信息"""
+        tts_tables.print()
+# 全局TTS管理器实例
+tts_manager = TTSManager()
+def register_all_tts():
+    """自动发现并注册runtime目录中的所有TTS实现"""
+    import os
+    import importlib
+    from pathlib import Path
+    # 获取runtime目录路径
+    runtime_dir = Path(__file__).parent / "runtime"
+    # 扫描runtime目录中的Python文件
+    for py_file in runtime_dir.glob("*.py"):
+        if py_file.name in ["__init__.py", "interface.py"]:
+            continue
+        module_name = py_file.stem
+        try:
+            # 动态导入模块
+            module = importlib.import_module(f".runtime.{module_name}",
+                                           package="VoiceDialogue.services.audio.audio_generator")
+            logging.info(f"Successfully imported TTS module: {module_name}")
+        except ImportError as e:
+            logging.warning(f"Failed to import TTS module {module_name}: {e}")
+        except Exception as e:
+            logging.error(f"Unexpected error importing TTS module {module_name}: {e}")
+# 在模块导入时自动注册所有TTS
+register_all_tts()

src/VoiceDialogue/services/audio/audio_player.py CHANGED Viewed

@@ -70,7 +70,8 @@ class AudioStreamPlayer(BaseThread):
                 voice_state_manager.set_audio_playing(task_id)
                 voice_state_manager.reset_task_id()
-                self.playing_audio(voice_task.tts_generated_sentence_audio)
                 if self.audio_playing_queue.empty():
                     print(f'回答播放完了')
@@ -90,11 +91,9 @@ class AudioStreamPlayer(BaseThread):
         chat_history_cache[voice_task.session_id] = chat_history
-    def playing_audio(self, tts_generated_audio):
-        audio_data = tts_generated_audio[0][1]
-        samplerate = tts_generated_audio[0][0]
         with tempfile.NamedTemporaryFile('w+b', suffix='.wav') as soundfile:
             # print(f'================soundfile : {soundfile.name}')
-            sf.write(soundfile, audio_data, samplerate=samplerate, subtype='PCM_16', closefd=False)
             # print(soundfile.name)
             playsound(soundfile.name, block=True)

                 voice_state_manager.set_audio_playing(task_id)
                 voice_state_manager.reset_task_id()
+                audio_data, sample_rate = voice_task.tts_generated_sentence_audio
+                self.playing_audio(audio_data, sample_rate)
                 if self.audio_playing_queue.empty():
                     print(f'回答播放完了')
         chat_history_cache[voice_task.session_id] = chat_history
+    def playing_audio(self, audio_data, sample_rate=16000):
         with tempfile.NamedTemporaryFile('w+b', suffix='.wav') as soundfile:
             # print(f'================soundfile : {soundfile.name}')
+            sf.write(soundfile, audio_data, samplerate=sample_rate, subtype='PCM_16', closefd=False)
             # print(soundfile.name)
             playsound(soundfile.name, block=True)