Trelis
/

piper-en-us-ryan-medium

@@ -13,7 +13,7 @@ pipeline_tag: text-to-speech
 # Piper TTS: en_US-ryan-medium
-Medium-quality US English male voice.
 ## Model Details
@@ -23,13 +23,16 @@ Medium-quality US English male voice.
 | Format | ONNX |
 | Language | English (US) |
 | Gender | Male |
-| Quality | medium |
 | Sample Rate | 22050 Hz |
 | License | CC BY-NC-SA 4.0 |
 ## Usage
-This model can be used with [Piper TTS](https://github.com/rhasspy/piper) or any ONNX-compatible runtime.
 ```python
 from piper import PiperVoice
@@ -40,6 +43,62 @@ for chunk in voice.synthesize("Hello, this is a test."):
     pass
 ```
 ## Fine-tuning
 You can fine-tune this model on your own voice data using [Trelis Studio](https://studio.trelis.com). Piper models can be trained on custom datasets to create personalized voices.

 # Piper TTS: en_US-ryan-medium
+Medium-size US English male voice.
 ## Model Details
 | Format | ONNX |
 | Language | English (US) |
 | Gender | Male |
+| Model Size | medium (~63 MB ONNX, ~15M params) |
 | Sample Rate | 22050 Hz |
 | License | CC BY-NC-SA 4.0 |
+> **Note:** Piper uses the terms "medium", "high", etc. to refer to **model size**, not output quality.
+> Medium models (~63 MB, ~15M params) and high models (~114 MB, ~28M params) both produce 22.05 kHz audio.
 ## Usage
+### With piper-tts (GPL)
 ```python
 from piper import PiperVoice
     pass
 ```
+### Standalone ONNX (MIT — no piper-tts dependency)
+Requires `espeak-ng` installed (`brew install espeak-ng` / `apt install espeak-ng`).
+```python
+import json, subprocess, numpy as np, onnxruntime as ort, soundfile as sf
+from huggingface_hub import hf_hub_download
+model_id = "Trelis/piper-en-us-ryan-medium"
+onnx_path = hf_hub_download(model_id, "model.onnx")
+config_path = hf_hub_download(model_id, "model.onnx.json")
+with open(config_path) as f:
+    config = json.load(f)
+session = ort.InferenceSession(onnx_path, providers=["CPUExecutionProvider"])
+phoneme_id_map = config["phoneme_id_map"]
+espeak_voice = config["espeak"]["voice"]
+def phonemize(text, voice):
+    out = subprocess.run(
+        ["espeak-ng", "-v", voice, "-q", "--ipa=2", "-x", text],
+        capture_output=True, text=True,
+    ).stdout.strip()
+    return [list(line.replace("_", " ")) for line in out.split("\n") if line.strip()]
+def to_ids(phonemes, pmap):
+    ids = [pmap["^"][0], pmap["_"][0]]
+    for p in phonemes:
+        if p in pmap:
+            ids.extend(pmap[p])
+            ids.append(pmap["_"][0])
+    ids.append(pmap["$"][0])
+    return ids
+text = "Hello, this is a test."
+audio_chunks = []
+for sentence in phonemize(text, espeak_voice):
+    ids = to_ids(sentence, phoneme_id_map)
+    if len(ids) < 3:
+        continue
+    audio = session.run(None, {
+        "input": np.array([ids], dtype=np.int64),
+        "input_lengths": np.array([len(ids)], dtype=np.int64),
+        "scales": np.array([
+            config["inference"]["noise_scale"],
+            config["inference"]["length_scale"],
+            config["inference"]["noise_w"],
+        ], dtype=np.float32),
+    })[0]
+    audio_chunks.append(audio.squeeze())
+audio = np.concatenate(audio_chunks).astype(np.float32)
+sf.write("output.wav", audio, config["audio"]["sample_rate"])
+```
 ## Fine-tuning
 You can fine-tune this model on your own voice data using [Trelis Studio](https://studio.trelis.com). Piper models can be trained on custom datasets to create personalized voices.