Instructions to use MoYoYoTech/Translator with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MoYoYoTech/Translator with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="MoYoYoTech/Translator",
	filename="moyoyo_asr_models/qwen2.5-1.5b-instruct-q5_0.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use MoYoYoTech/Translator with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf MoYoYoTech/Translator:Q5_0
# Run inference directly in the terminal:
llama-cli -hf MoYoYoTech/Translator:Q5_0

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf MoYoYoTech/Translator:Q5_0
# Run inference directly in the terminal:
llama-cli -hf MoYoYoTech/Translator:Q5_0

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf MoYoYoTech/Translator:Q5_0
# Run inference directly in the terminal:
./llama-cli -hf MoYoYoTech/Translator:Q5_0

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf MoYoYoTech/Translator:Q5_0
# Run inference directly in the terminal:
./build/bin/llama-cli -hf MoYoYoTech/Translator:Q5_0

Use Docker

docker model run hf.co/MoYoYoTech/Translator:Q5_0

LM Studio
Jan
Ollama
How to use MoYoYoTech/Translator with Ollama:
```
ollama run hf.co/MoYoYoTech/Translator:Q5_0
```

Unsloth Studio

How to use MoYoYoTech/Translator with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MoYoYoTech/Translator to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MoYoYoTech/Translator to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for MoYoYoTech/Translator to start chatting

How to use MoYoYoTech/Translator with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf MoYoYoTech/Translator:Q5_0

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "MoYoYoTech/Translator:Q5_0"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use MoYoYoTech/Translator with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf MoYoYoTech/Translator:Q5_0

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default MoYoYoTech/Translator:Q5_0

Run Hermes

hermes

Docker Model Runner
How to use MoYoYoTech/Translator with Docker Model Runner:
```
docker model run hf.co/MoYoYoTech/Translator:Q5_0
```

Lemonade

How to use MoYoYoTech/Translator with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull MoYoYoTech/Translator:Q5_0

Run and chat with the model

lemonade run user.Translator-Q5_0

List all available models

lemonade list

Xin Zhang commited on Apr 23, 2025

Commit

0c9fcfc

1 Parent(s): b12f0fd

[fix]: remove unused file.

Browse files

Files changed (1) hide show

transcribe/helpers/vad_dynamic.py +0 -430

transcribe/helpers/vad_dynamic.py DELETED Viewed

@@ -1,430 +0,0 @@
-from copy import deepcopy
-from queue import Queue, Empty
-from time import time
-from config import VAD_MODEL_PATH
-# from silero_vad import load_silero_vad
-import numpy as np
-import onnxruntime
-class OnnxWrapper():
-    def __init__(self, path, force_onnx_cpu=False):
-        opts = onnxruntime.SessionOptions()
-        opts.inter_op_num_threads = 1
-        opts.intra_op_num_threads = 1
-        if force_onnx_cpu and 'CPUExecutionProvider' in onnxruntime.get_available_providers():
-            self.session = onnxruntime.InferenceSession(path, providers=['CPUExecutionProvider'], sess_options=opts)
-        else:
-            self.session = onnxruntime.InferenceSession(path, sess_options=opts)
-        self.reset_states()
-        self.sample_rates = [16000]
-    def _validate_input(self, x: np.ndarray, sr: int):
-        if x.ndim == 1:
-            x = x[None]
-        if x.ndim > 2:
-            raise ValueError(f"Too many dimensions for input audio chunk {x.ndim}")
-        if sr != 16000 and (sr % 16000 == 0):
-            step = sr // 16000
-            x = x[:, ::step]
-            sr = 16000
-        if sr not in self.sample_rates:
-            raise ValueError(f"Supported sampling rates: {self.sample_rates} (or multiply of 16000)")
-        if sr / x.shape[1] > 31.25:
-            raise ValueError("Input audio chunk is too short")
-        return x, sr
-    def reset_states(self, batch_size=1):
-        self._state = np.zeros((2, batch_size, 128)).astype(np.float32)
-        self._context = np.zeros(0)
-        self._last_sr = 0
-        self._last_batch_size = 0
-    def __call__(self, x, sr: int):
-        x, sr = self._validate_input(x, sr)
-        num_samples = 512 if sr == 16000 else 256
-        if x.shape[-1] != num_samples:
-            raise ValueError(
-                f"Provided number of samples is {x.shape[-1]} (Supported values: 256 for 8000 sample rate, 512 for 16000)")
-        batch_size = x.shape[0]
-        context_size = 64 if sr == 16000 else 32
-        if not self._last_batch_size:
-            self.reset_states(batch_size)
-        if (self._last_sr) and (self._last_sr != sr):
-            self.reset_states(batch_size)
-        if (self._last_batch_size) and (self._last_batch_size != batch_size):
-            self.reset_states(batch_size)
-        if not len(self._context):
-            self._context = np.zeros((batch_size, context_size)).astype(np.float32)
-        x = np.concatenate([self._context, x], axis=1)
-        if sr in [8000, 16000]:
-            ort_inputs = {'input': x, 'state': self._state, 'sr': np.array(sr, dtype='int64')}
-            ort_outs = self.session.run(None, ort_inputs)
-            out, state = ort_outs
-            self._state = state
-        else:
-            raise ValueError()
-        self._context = x[..., -context_size:]
-        self._last_sr = sr
-        self._last_batch_size = batch_size
-        # out = torch.from_numpy(out)
-        return out
-    def audio_forward(self, audio: np.ndarray, sr: int):
-        outs = []
-        x, sr = self._validate_input(audio, sr)
-        self.reset_states()
-        num_samples = 512 if sr == 16000 else 256
-        if x.shape[1] % num_samples:
-            pad_num = num_samples - (x.shape[1] % num_samples)
-            x = np.pad(x, ((0, 0), (0, pad_num)), 'constant', constant_values=(0.0, 0.0))
-        for i in range(0, x.shape[1], num_samples):
-            wavs_batch = x[:, i:i + num_samples]
-            out_chunk = self.__call__(wavs_batch, sr)
-            outs.append(out_chunk)
-        stacked = np.concatenate(outs, axis=1)
-        return stacked
-class VADIteratorOnnx:
-    def __init__(self,
-                 threshold: float = 0.5,
-                 sampling_rate: int = 16000,
-                 min_silence_duration_ms: int = 100,
-                 max_speech_duration_s: float = float('inf'),
-                 long_speech_threshold_s: float = 6.0, # 新增：长语音阈值（秒）
-                 adjusted_min_silence_factor: float = 0.5 # 新增：调整后的静音时长因子
-                 ):
-        self.model = OnnxWrapper(VAD_MODEL_PATH, True)
-        self.threshold = threshold
-        self.sampling_rate = sampling_rate
-        if sampling_rate not in [8000, 16000]:
-            raise ValueError('VADIterator does not support sampling rates other than [8000, 16000]')
-        self._original_min_silence_samples = sampling_rate * min_silence_duration_ms / 1000 # 存储原始值
-        self.min_silence_samples = self._original_min_silence_samples # 当前使用的值
-        self.adjusted_min_silence_samples = self._original_min_silence_samples * adjusted_min_silence_factor # 计算调整后的值
-        self.long_speech_threshold_samples = sampling_rate * long_speech_threshold_s # 长语音阈值（样本数）
-        self.max_speech_samples = int(sampling_rate * max_speech_duration_s)
-        # self.speech_pad_samples = sampling_rate * speech_pad_ms / 1000
-        self.reset_states()
-    def reset_states(self):
-        self.model.reset_states()
-        self.triggered = False
-        self.temp_end = 0
-        self.current_sample = 0
-        self.start = 0
-        self.speech_start_sample = 0 # 新增：记录连续语音开始的样本点
-        self.min_silence_samples = self._original_min_silence_samples # 重置为原始值
-    def __call__(self, x: np.ndarray, return_seconds=False):
-        """
-        x: np.ndarray
-            audio chunk (see examples in repo)
-        return_seconds: bool (default - False)
-            whether return timestamps in seconds (default - samples)
-        """
-        window_size_samples = 512 if self.sampling_rate == 16000 else 256
-        x = x[:window_size_samples]
-        if len(x) < window_size_samples:
-            x = np.pad(x, ((0, 0), (0, window_size_samples - len(x))), 'constant', constant_values=0.0)
-        self.current_sample += window_size_samples
-        speech_prob = self.model(x, self.sampling_rate)[0,0]
-        # print(f"{self.current_sample/self.sampling_rate:.2f}: {speech_prob}")
-        # --- 动态调整逻辑 ---
-        current_min_silence_samples_to_use = self._original_min_silence_samples
-        if self.triggered and self.speech_start_sample > 0:
-            current_speech_duration_samples = self.current_sample - self.speech_start_sample
-            if current_speech_duration_samples > self.long_speech_threshold_samples:
-                # 如果连续语音超过阈值，使用调整后的（更短的）静音时长
-                current_min_silence_samples_to_use = self.adjusted_min_silence_samples
-        # --- 结束动态调整逻辑 ---
-        if (speech_prob >= self.threshold) and self.temp_end:
-            # 从临时静音恢复到语音，清除临时结束点
-            self.temp_end = 0
-        if (speech_prob >= self.threshold) and not self.triggered:
-            # 检测到语音开始
-            self.triggered = True
-            speech_start = max(0, self.current_sample - window_size_samples)
-            self.start = speech_start
-            self.speech_start_sample = self.start # 记录语音开始点
-            # self.min_silence_samples = self._original_min_silence_samples # 在 reset_states 中重置
-            return {'start': int(speech_start) if not return_seconds else round(speech_start / self.sampling_rate, 1)}
-        if (speech_prob >= self.threshold) and self.current_sample - self.start >= self.max_speech_samples:
-            # 达到最大语音长度，强制结束（如果设置了）
-            if self.temp_end:
-                self.temp_end = 0
-            speech_end = self.current_sample # 使用当前样本点作为结束点
-            self.triggered = False # 结束当前段
-            self.speech_start_sample = 0 # 重置连续语音开始点
-            # self.min_silence_samples = self._original_min_silence_samples # 在 reset_states 中重置
-            # 返回结束事件，并重置 start 以便可以立即开始新的段
-            end_val = int(speech_end) if not return_seconds else round(speech_end / self.sampling_rate, 1)
-            self.start = speech_end # 将 start 设置为当前结束点，为下一段做准备？或者在 VadV2 中处理？ VadV2 会重置 start/end
-            return {'end': end_val}
-        if (speech_prob < self.threshold - 0.15) and self.triggered:
-            # 检测到可能的静音
-            if not self.temp_end:
-                self.temp_end = self.current_sample # 记录可能的结束点
-            # 使用当前计算出的（可能调整过的）静音时长阈值进行判断
-            if self.current_sample - self.temp_end < current_min_silence_samples_to_use:
-                # 静音时间不够长，忽略
-                return None
-            else:
-                # 静音时间足够长，确认语音结束
-                speech_end = self.temp_end - window_size_samples # 结束点是临时结束点减去窗口大小
-                self.temp_end = 0
-                self.triggered = False
-                self.speech_start_sample = 0 # 重置连续语音开始点
-                # self.min_silence_samples = self._original_min_silence_samples # 在 reset_states 中重置
-                return {'end': int(speech_end) if not return_seconds else round(speech_end / self.sampling_rate, 1)}
-        return None
-class VadV2:
-    def __init__(self,
-                threshold: float = 0.5,
-                sampling_rate: int = 16000,
-                min_silence_duration_ms: int = 100,
-                speech_pad_ms: int = 30,
-                max_speech_duration_s: float = float('inf'),
-                long_speech_threshold_s: float = 10.0, # 提高默认值，减少动态调整频率
-                adjusted_min_silence_factor: float = 0.6 # 提高默认值，使调整不那么激进
-                ):
-        self.vad_iterator = VADIteratorOnnx(threshold, sampling_rate, min_silence_duration_ms, max_speech_duration_s,
-                                            long_speech_threshold_s, adjusted_min_silence_factor)
-        self.speech_pad_samples = int(sampling_rate * speech_pad_ms / 1000)
-        self.sampling_rate = sampling_rate
-        self.audio_buffer = np.array([], dtype=np.float32)
-        self.start = 0
-        self.end = 0
-        self.offset = 0
-        # 检查 speech_pad_ms 是否小于 min_silence_duration_ms 是一个好的实践，但非强制
-        # assert speech_pad_ms <= min_silence_duration_ms, "speech_pad_ms should be less than min_silence_duration_ms"
-        self.max_speech_samples = int(sampling_rate * max_speech_duration_s)
-        self.silence_chunk_size = 0
-        # 基于窗口大小计算静音阈值（例如，大约2秒的静音）
-        self.silence_chunk_threshold = int(2.0 / (512 / self.sampling_rate))
-    def reset(self):
-        self.audio_buffer = np.array([], dtype=np.float32)
-        self.start = 0
-        self.end = 0
-        self.offset = 0
-        self.vad_iterator.reset_states()
-        self.silence_chunk_size = 0 # 重置静音计数器
-    def __call__(self, x: np.ndarray = None):
-        if x is None:
-            # 处理缓冲区中剩余的音频
-            # 检查条件：VAD 正在触发状态，或者 VAD 未触发但已检测到 start 且缓冲区有内容
-            if self.vad_iterator.triggered or (self.start > self.offset and len(self.audio_buffer) > 0):
-                start_global = max(self.offset, self.start - self.speech_pad_samples)
-                # 结束点是缓冲区的绝对末尾
-                end_global = self.offset + len(self.audio_buffer)
-                # 确保 start < end
-                if start_global < end_global:
-                    start_ts = round(start_global / self.sampling_rate, 1)
-                    end_ts = round(end_global / self.sampling_rate, 1)
-                    # 提取数据，从计算出的 buffer 内索引开始到 buffer 末尾
-                    buffer_start_index = max(0, start_global - self.offset)
-                    audio_data = self.audio_buffer[buffer_start_index:]
-                    if len(audio_data) > 0:
-                        result = {
-                            "start": start_ts,
-                            "end": end_ts,
-                            "audio": audio_data,
-                        }
-                    else:
-                        result = None
-                else:
-                    result = None # start >= end, 无效片段
-            else:
-                result = None # 无需处理的剩余音频
-            self.reset() # 处理完剩余部分后重置状态
-            return result
-        # 将新音频块添加到缓冲区
-        self.audio_buffer = np.append(self.audio_buffer, deepcopy(x))
-        # 使用 VAD 迭代器处理新块
-        vad_result = self.vad_iterator(x)
-        if vad_result is not None:
-            self.silence_chunk_size = 0 # VAD 有活动，重置静音计数
-            if 'start' in vad_result:
-                # 仅当尚未开始一个新片段时更新 start
-                # (self.start <= self.offset 意味着上一个片段已结束或从未开始)
-                if self.start <= self.offset:
-                    self.start = vad_result['start'] + self.offset
-            if 'end' in vad_result:
-                # 仅当已检测到 start 时更新 end
-                if self.start > self.offset:
-                    self.end = vad_result['end'] + self.offset
-        else:
-            # 仅在 VAD 未触发且未检测到语音开始时增加静音计数
-            if not self.vad_iterator.triggered and self.start <= self.offset:
-                self.silence_chunk_size += 1
-        # --- 缓冲区管理 ---
-        # 1. 清理前导静音 (如果从未检测到语音开始)
-        if self.start <= self.offset and not self.vad_iterator.triggered and len(self.audio_buffer) > self.speech_pad_samples:
-            # 仅当 VAD 内部状态也确认无语音时清理
-            if self.vad_iterator.speech_start_sample == 0:
-                clearable_length = len(self.audio_buffer) - self.speech_pad_samples
-                self.offset += clearable_length
-                self.audio_buffer = self.audio_buffer[clearable_length:]
-                self.silence_chunk_size = 0 # 清理后重置计数
-        # 2. 因长时间静音清理缓冲区 (如果从未检测到语音开始)
-        if self.start <= self.offset and not self.vad_iterator.triggered and self.silence_chunk_size >= self.silence_chunk_threshold:
-            clearable_length = len(self.audio_buffer) # 清理到当前位置的所有内容
-            if clearable_length > 0:
-                self.offset += clearable_length
-                self.audio_buffer = np.array([], dtype=np.float32) # 清空缓冲区
-            self.silence_chunk_size = 0 # 重置计数
-        # --- 结束缓冲区管理 ---
-        # --- 片段提取 ---
-        segment_to_return = None
-        if self.end > self.start:
-            # 检测到完整语音段 [start, end]
-            start_global = max(self.offset, self.start - self.speech_pad_samples)
-            end_global = self.end + self.speech_pad_samples
-            # 实际能提取的结束点不能超过当前缓冲区的末尾
-            effective_end_global = min(end_global, self.offset + len(self.audio_buffer))
-            # 确保 start_global < effective_end_global
-            if start_global < effective_end_global:
-                start_ts = round(start_global / self.sampling_rate, 1)
-                # 时间戳使用理论上的 end_global
-                end_ts = round(end_global / self.sampling_rate, 1)
-                # 计算在当前 audio_buffer 中的索引
-                buffer_start_index = max(0, start_global - self.offset)
-                buffer_end_index = effective_end_global - self.offset
-                if buffer_start_index < buffer_end_index: # 确保索引有效
-                    audio_data = self.audio_buffer[buffer_start_index : buffer_end_index]
-                    # --- 更新缓冲区和 Offset ---
-                    # 保留从提取片段之后的数据
-                    keep_from_index = buffer_end_index
-                    if keep_from_index < len(self.audio_buffer):
-                        self.audio_buffer = self.audio_buffer[keep_from_index:]
-                        # *** 关键修复 ***: 新的 offset 是保留下来的缓冲区的起始全局位置
-                        self.offset = effective_end_global
-                    else:
-                        # 提取的片段到达或超过了缓冲区的末尾
-                        self.audio_buffer = np.array([], dtype=np.float32)
-                        self.offset = effective_end_global # Offset 更新到缓冲区结束的位置
-                    # 重置 start 和 end 以寻找下一个片段
-                    # 新的查找应该从新的 offset 开始
-                    self.start = self.offset
-                    self.end = self.offset
-                    segment_to_return = {
-                        "start": start_ts,
-                        "end": end_ts,
-                        "audio": audio_data,
-                    }
-                else:
-                    # 索引无效，可能由快速的 start/end 事件或 padding 引起
-                    # 谨慎重置状态，避免丢失同步
-                    self.start = self.offset
-                    self.end = self.offset
-            else:
-                # start_global >= effective_end_global，无效，重置状态
-                self.start = self.offset
-                self.end = self.offset
-        return segment_to_return
-class VadProcessor:
-    def __init__(
-            self,
-            prob_threshold=0.5,
-            silence_s=0.2,
-            cache_s=0.15, # 这个参数现在由 VadV2 内部的 speech_pad_ms 控制
-            sr=16000,
-            long_speech_threshold_s: float = 6.0, # 新增：默认长语音阈值
-            adjusted_min_silence_factor: float = 0.5 # 新增：默认调整因子
-    ):
-        self.prob_threshold = prob_threshold
-        # self.cache_s = cache_s # 不再直接使用 cache_s，改用 speech_pad_ms
-        self.sr = sr
-        self.silence_s = silence_s # 用于 min_silence_duration_ms
-        self.speech_pad_s = cache_s # 将 cache_s 理解为 speech_pad_ms
-        # 传递所有参数给 VadV2
-        self.vad = VadV2(
-            threshold=self.prob_threshold,
-            sampling_rate=self.sr,
-            min_silence_duration_ms=int(self.silence_s * 1000),
-            speech_pad_ms=int(self.speech_pad_s * 1000),
-            max_speech_duration_s=15, # 保持原来的最大时长限制
-            long_speech_threshold_s=long_speech_threshold_s, # 传递新参数
-            adjusted_min_silence_factor=adjusted_min_silence_factor # 传递新参数
-        )
-    def process_audio(self, audio_buffer: np.ndarray):
-        audio = np.array([], np.float32)
-        chunk_size = 512 # VAD 模型期望的块大小
-        for i in range(0, len(audio_buffer), chunk_size):
-            chunk = audio_buffer[i:i+chunk_size]
-            # 如果是最后一块且长度不足，VADIteratorOnnx 内部会处理 padding
-            ret = self.vad(chunk)
-            if ret:
-                audio = np.append(audio, ret['audio'])
-        # 处理结束后，调用 vad(None) 来获取缓冲区中剩余的音频
-        final_ret = self.vad(None)
-        if final_ret:
-            audio = np.append(audio, final_ret['audio'])
-        return audio
-    # 可能需要一个 reset 方法来重置 VAD 状态，以备复用 VadProcessor 实例
-    def reset(self):
-        self.vad.reset()