Instructions to use MoYoYoTech/VoiceDialogue with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MoYoYoTech/VoiceDialogue with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-to-speech", model="MoYoYoTech/VoiceDialogue")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("MoYoYoTech/VoiceDialogue", dtype="auto")

llama-cpp-python

How to use MoYoYoTech/VoiceDialogue with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="MoYoYoTech/VoiceDialogue",
	filename="assets/models/llm/qwen/Qwen3-8B-Q6_K.gguf",
)

llm.create_chat_completion(
	messages = "\"The answer to the universe is 42\""
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use MoYoYoTech/VoiceDialogue with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
./llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
./build/bin/llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Use Docker

docker model run hf.co/MoYoYoTech/VoiceDialogue:Q6_K

LM Studio
Jan
Ollama
How to use MoYoYoTech/VoiceDialogue with Ollama:
```
ollama run hf.co/MoYoYoTech/VoiceDialogue:Q6_K
```

Unsloth Studio new

How to use MoYoYoTech/VoiceDialogue with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

Pi new

How to use MoYoYoTech/VoiceDialogue with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "MoYoYoTech/VoiceDialogue:Q6_K"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use MoYoYoTech/VoiceDialogue with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default MoYoYoTech/VoiceDialogue:Q6_K

Run Hermes

hermes

Docker Model Runner
How to use MoYoYoTech/VoiceDialogue with Docker Model Runner:
```
docker model run hf.co/MoYoYoTech/VoiceDialogue:Q6_K
```

Lemonade

How to use MoYoYoTech/VoiceDialogue with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull MoYoYoTech/VoiceDialogue:Q6_K

Run and chat with the model

lemonade run user.VoiceDialogue-Q6_K

List all available models

lemonade list

liumaolin commited on Jun 5, 2025

Commit

1cbd55c

1 Parent(s): d231de5

Add Kokoro TTS support: integrate new TTS model, configuration, and runtime components for enhanced multilingual voice synthesis.

Browse files

Files changed (9) hide show

models/tts/kokoro-v1.0.int8.onnx +3 -0
models/tts/voices-v1.0.bin +3 -0
src/VoiceDialogue/services/audio/audio_generator/configs/kokoro.py +58 -0
src/VoiceDialogue/services/audio/audio_generator/models/__init__.py +21 -0
src/VoiceDialogue/services/audio/audio_generator/models/kokoro.py +62 -0
src/VoiceDialogue/services/audio/audio_generator/runtime/__init__.py +3 -1
src/VoiceDialogue/services/audio/audio_generator/runtime/interface.py +1 -1
src/VoiceDialogue/services/audio/audio_generator/runtime/kokoro.py +48 -0
src/VoiceDialogue/services/audio/audio_generator/runtime/moyoyo.py +2 -2

models/tts/kokoro-v1.0.int8.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6e742170d309016e5891a994e1ce1559c702a2ccd0075e67ef7157974f6406cb
+size 92361271

models/tts/voices-v1.0.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bca610b8308e8d99f32e6fe4197e7ec01679264efed0cac9140fe9c29f1fbf7d
+size 28214398

src/VoiceDialogue/services/audio/audio_generator/configs/kokoro.py ADDED Viewed

	@@ -0,0 +1,58 @@

+from ..models.kokoro import KokoroTTSConfig
+ENGLISH_MODEL_FILES = {
+    'model': 'kokoro-v1.0.int8.onnx',
+    'voice': 'voices-v1.0.bin'
+}
+CHINESE_MODEL_FILES = {
+    'model': 'kokoro-v1.1-zh.onnx',
+    'voice': 'voices-v1.1-zh.bin',
+    'vocab_config': 'config.json'
+}
+KOKORO_TTS_CONFIGS = [
+    {
+        'character_name': 'Heart',
+        'cover_image': '',
+        'description': 'Heart是一个温暖亲切的英语女性语音，声音富有感情色彩，适合情感表达和温馨内容的语音合成。',
+        'file_size': '',
+        'is_chinese_voice': False,
+        'inference_parameters': {
+            'voice': 'af_heart',
+            'speed': 1.0,
+            'is_phonemes': True,
+        },
+        'model_files': ENGLISH_MODEL_FILES,
+    },
+    {
+        'character_name': 'Bella',
+        'cover_image': '',
+        'description': 'Bella是一个优质的英语女性语音，具有清晰自然的发音和良好的表现力，适合各种英语内容的语音合成。',
+        'file_size': '',
+        'is_chinese_voice': False,
+        'inference_parameters': {
+            'voice': 'af_bella',
+            'speed': 1.0,
+            'is_phonemes': True,
+        },
+        'model_files': ENGLISH_MODEL_FILES,
+    },
+    {
+        'character_name': 'Nicole',
+        'cover_image': '',
+        'description': 'Nicole是一个高质量的英语女性语音，发音清晰准确，语调自然流畅。',
+        'file_size': '',
+        'is_chinese_voice': False,
+        'inference_parameters': {
+            'voice': 'af_nicole',
+            'speed': 1.0,
+            'is_phonemes': True,
+        },
+        'model_files': ENGLISH_MODEL_FILES,
+    },
+]
+def get_kokoro_configs() -> list[KokoroTTSConfig]:
+    return [KokoroTTSConfig(**config) for config in KOKORO_TTS_CONFIGS]

src/VoiceDialogue/services/audio/audio_generator/models/__init__.py CHANGED Viewed

@@ -26,6 +26,16 @@ except ImportError:
     logging.warning("MoYoYo TTS config not available")
 # 动态构建导出列表
 __all__ = [
     'TTSConfigType',
@@ -37,6 +47,8 @@ __all__ = [
 if _moyoyo_available:
     __all__.append('MoYoYoTTSConfig')
 # 自动注册所有可用的配置
@@ -51,6 +63,15 @@ def _auto_register_configs():
         import logging
         logging.error(f"Failed to auto-register configs: {e}")
 # 模块加载时自动注册配置
 _auto_register_configs()

     logging.warning("MoYoYo TTS config not available")
+try:
+    from .kokoro import KokoroTTSConfig
+    _kokoro_available = True
+except ImportError:
+    _kokoro_available = False
+    import logging
+    logging.warning("Kokoro TTS config not available")
 # 动态构建导出列表
 __all__ = [
     'TTSConfigType',
 if _moyoyo_available:
     __all__.append('MoYoYoTTSConfig')
+if _kokoro_available:
+    __all__.append('KokoroTTSConfig')
 # 自动注册所有可用的配置
         import logging
         logging.error(f"Failed to auto-register configs: {e}")
+    try:
+        if _kokoro_available:
+            from ..configs.kokoro import get_kokoro_configs
+            for config in get_kokoro_configs():
+                tts_config_registry.register_config(config)
+    except Exception as e:
+        import logging
+        logging.error(f"Failed to auto-register configs: {e}")
 # 模块加载时自动注册配置
 _auto_register_configs()

src/VoiceDialogue/services/audio/audio_generator/models/kokoro.py ADDED Viewed

	@@ -0,0 +1,62 @@

+import typing
+from pathlib import Path
+from pydantic import BaseModel, Field
+from .base import BaseTTSConfig, TTSConfigType
+from config import paths
+class InferenceParameters(BaseModel):
+    """Kokoro TTS 推理参数"""
+    voice: str = Field(description="语音角色名称")
+    speed: float = Field(default=1.0, description="语音播放速度")
+    is_phonemes: bool = Field(default=True, description="是否使用音素")
+class ModelFiles(BaseModel):
+    """模型文件配置"""
+    model: str = Field(default='', description="模型文件名")
+    voice: str = Field(default='', description="语音文件名")
+    vocab_config: str = Field(default=None, description="音素配置文件名")
+class KokoroTTSConfig(BaseTTSConfig):
+    tts_type: TTSConfigType = TTSConfigType.KOKORO
+    inference_parameters: InferenceParameters
+    model_files: ModelFiles
+    def get_model_storage_path(self) -> Path:
+        storage_path = paths.MODELS_PATH / 'tts'
+        if not storage_path.exists():
+            storage_path.mkdir(parents=True, exist_ok=True)
+        return storage_path
+    def is_model_complete(self) -> bool:
+        storage_path = self.get_model_storage_path()
+        for model_file in self.model_files.model_dump().values():
+            if not model_file:
+                continue
+            file_path = storage_path / model_file
+            if not file_path.exists():
+                return False
+        return True
+    def download_model(self, progress_callback: typing.Callable = None):
+        pass
+    def delete_model(self):
+        pass
+    @property
+    def model_path(self):
+        return self.get_model_storage_path() / self.model_files.model
+    @property
+    def voices_path(self):
+        return self.get_model_storage_path() / self.model_files.voice
+    @property
+    def vocab_config_path(self):
+        return self.get_model_storage_path() / self.model_files.vocab_config

src/VoiceDialogue/services/audio/audio_generator/runtime/__init__.py CHANGED Viewed

@@ -12,11 +12,13 @@ from .interface import TTSInterface, TTSFactory
 # 导入所有TTS实现，确保注册装饰器被执行
 try:
     from .moyoyo import MoYoYoTTS
     __all__ = [
         'TTSInterface',
         'TTSFactory',
-        'MoYoYoTTS'
     ]
 except ImportError as e:
     # 如果某些TTS实现无法导入，不影响整体功能

 # 导入所有TTS实现，确保注册装饰器被执行
 try:
     from .moyoyo import MoYoYoTTS
+    from .kokoro import KokoroTTS
     __all__ = [
         'TTSInterface',
         'TTSFactory',
+        'MoYoYoTTS',
+        'KokoroTTS'
     ]
 except ImportError as e:
     # 如果某些TTS实现无法导入，不影响整体功能

src/VoiceDialogue/services/audio/audio_generator/runtime/interface.py CHANGED Viewed

@@ -34,7 +34,7 @@ class TTSInterface(ABC):
         pass
     @abstractmethod
-    def synthesize(self, text: str, **kwargs) -> Tuple[int, np.ndarray]:
         """
         将文本转换为语音

         pass
     @abstractmethod
+    def synthesize(self, text: str, **kwargs) -> Tuple[np.ndarray, int]:
         """
         将文本转换为语音

src/VoiceDialogue/services/audio/audio_generator/runtime/kokoro.py ADDED Viewed

	@@ -0,0 +1,48 @@

+from typing import Tuple, Optional
+import numpy as np
+from kokoro_onnx import Kokoro
+from .interface import TTSInterface
+from ..configs.kokoro import KokoroTTSConfig
+from ..manager import tts_tables
+@tts_tables.register("tts_classes", "kokoro")
+class KokoroTTS(TTSInterface):
+    def __init__(self, config: KokoroTTSConfig):
+        super().__init__(config)
+        self.tts_model: Optional[Kokoro] = None
+        self.espeak_ng = None
+    def setup(self, **kwargs) -> None:
+        if self.config.is_chinese_voice:
+            self.tts_model = Kokoro(
+                model_path=self.config.model_path,
+                voices_path=self.config.voices_path,
+                vocab_config=self.config.vocab_config_path,
+            )
+            from misaki import zh
+            self.espeak_ng = zh.ZHG2P(version="1.1")
+        else:
+            self.tts_model = Kokoro(
+                model_path=self.config.model_path,
+                voices_path=self.config.voices_path
+            )
+            from misaki import en, espeak
+            fallback = espeak.EspeakFallback(british=False)
+            self.espeak_ng = en.G2P(trf=False, british=False, fallback=fallback)
+    def warmup(self, warmup_steps: int = 1) -> None:
+        print('[INFO:] Warming up Kokoro TTS engine...')
+        warmup_texts = ['Warming up TTS engine.', '预热文字转音频引擎。']
+        for _ in range(warmup_steps):
+            for warmup_text in warmup_texts:
+                self.synthesize(warmup_text)
+        print('[INFO:] Warm up Kokoro TTS engine finished.')
+    def synthesize(self, text: str, **kwargs) -> Tuple[np.ndarray, int]:
+        phonemes, _ = self.espeak_ng(text)
+        samples, sample_rate = self.tts_model.create(phonemes, **self.config.inference_parameters.model_dump())
+        return samples, sample_rate

src/VoiceDialogue/services/audio/audio_generator/runtime/moyoyo.py CHANGED Viewed

@@ -34,12 +34,12 @@ class MoYoYoTTS(TTSInterface):
     def warmup(self, warmup_steps: int = 1) -> None:
         """预热TTS引擎"""
-        print('[INFO:] Warming up TTS engine...')
         warmup_texts = ['Warming up TTS engine.', '预热文字转音频引擎。']
         for _ in range(warmup_steps):
             for warmup_text in warmup_texts:
                 self.tts_module.generate_audio(warmup_text, warmup=True)
-        print('[INFO:] Warm up TTS engine finished.')
     def synthesize(self, text: str, **kwargs) -> Tuple[np.ndarray, int]:
         """合成语音"""

     def warmup(self, warmup_steps: int = 1) -> None:
         """预热TTS引擎"""
+        print('[INFO:] Warming up MoYoYo TTS engine...')
         warmup_texts = ['Warming up TTS engine.', '预热文字转音频引擎。']
         for _ in range(warmup_steps):
             for warmup_text in warmup_texts:
                 self.tts_module.generate_audio(warmup_text, warmup=True)
+        print('[INFO:] Warm up MoYoYo TTS engine finished.')
     def synthesize(self, text: str, **kwargs) -> Tuple[np.ndarray, int]:
         """合成语音"""