str20tbl commited on
Commit
db21234
·
verified ·
1 Parent(s): 27a7bff

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +227 -3
README.md CHANGED
@@ -1,3 +1,227 @@
1
- ---
2
- license: cc0-1.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - cy
4
+ - en
5
+ license: cc0-1.0
6
+ library_name: piper-tts
7
+ tags:
8
+ - text-to-speech
9
+ - tts
10
+ - welsh
11
+ - cymraeg
12
+ - audio
13
+ - onnx
14
+ - piper
15
+ - accessibility
16
+ - assistive-technology
17
+ - screen-reader
18
+ datasets:
19
+ - techiaith/bu-tts-cy-en
20
+ model-index:
21
+ - name: cy_GB-bu_tts
22
+ results: []
23
+ ---
24
+
25
+ # cy_GB-bu_tts - Welsh Neural Text-to-Speech
26
+
27
+ This is a Welsh (Cymraeg) neural text-to-speech model trained using [Piper](https://github.com/rhasspy/piper), a fast, local neural TTS system optimized for Raspberry Pi and other low-end devices.
28
+
29
+ **Developed by:** Uned Technolegau Iaith (Language Technologies Unit), Bangor University
30
+ **Model type:** Neural TTS (VITS-based architecture)
31
+ **Language:** Welsh (cy_GB)
32
+ **License:** CC0-1.0
33
+ **Format:** ONNX
34
+
35
+ ## Model Details
36
+
37
+ - **Architecture:** Based on Piper's VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech)
38
+ - **Speakers:** Multi-speaker model with 3 speaker variants
39
+ - **Quality:** Medium quality (suitable for screen readers and assistive technology)
40
+ - **Model Size:** Approximately 77 MB
41
+ - **Inference Speed:** Optimized for real-time synthesis on CPU
42
+ - **Sample Rate:** 22050 Hz
43
+ - **Training Framework:** [Piper training pipeline](https://github.com/rhasspy/piper)
44
+
45
+ ## Training Data
46
+
47
+ This model was trained on the [bu-tts-cy-en dataset](https://huggingface.co/datasets/techiaith/bu-tts-cy-en) (Bangor University Text to Speech Welsh-English dataset).
48
+
49
+ **Dataset characteristics:**
50
+ - **Size:** 10,000-100,000 samples
51
+ - **Languages:** Welsh and English (bilingual dataset)
52
+ - **License:** CC0 1.0 (Public Domain)
53
+ - **Content:** Audio recordings with corresponding text transcriptions
54
+ - **Source:** Language Technologies Unit, Bangor University
55
+
56
+ **Training data limitations:**
57
+ - Dataset consists of freely available recordings (public domain audiobooks and research-quality recordings)
58
+ - Coverage is not comprehensive across all Welsh vocabulary and contexts
59
+ - Some pronunciation patterns may be influenced by the limited speaker diversity in the training data
60
+ - Quality improvements would be possible with larger, more diverse, professionally-recorded datasets
61
+
62
+ ## Intended Use
63
+
64
+ **Primary use cases:**
65
+ - Screen readers and assistive technology (particularly [NVDA integration](https://github.com/techiaith/nvda-addon))
66
+ - Accessibility tools for Welsh speakers with visual impairments
67
+ - Welsh language learning applications
68
+ - Local, offline Welsh TTS applications
69
+ - Research in Welsh speech synthesis
70
+
71
+ **Supported platforms:**
72
+ - Compatible with Piper TTS runtime
73
+ - Works with [Sonata TTS engine](https://github.com/mush42/sonata)
74
+ - ONNX Runtime on x86/x64 architectures
75
+ - Raspberry Pi and other resource-constrained devices
76
+
77
+ ## Usage
78
+
79
+ ### With Piper
80
+
81
+ ```bash
82
+ # Download model files
83
+ wget https://huggingface.co/techiaith/cy_GB-bu_tts/resolve/main/cy_GB-bu_tts.onnx
84
+ wget https://huggingface.co/techiaith/cy_GB-bu_tts/resolve/main/cy_GB-bu_tts.onnx.json
85
+
86
+ # Run synthesis
87
+ echo "Bore da, sut wyt ti?" | piper \
88
+ --model cy_GB-bu_tts.onnx \
89
+ --output_file output.wav
90
+ ```
91
+
92
+ ### With NVDA Screen Reader
93
+
94
+ Install the [techiaith Welsh Neural Voices addon for NVDA](https://github.com/techiaith/nvda-addon):
95
+
96
+ 1. Download the addon from the [releases page](https://github.com/techiaith/nvda-addon/releases/latest)
97
+ 2. Install and restart NVDA
98
+ 3. Voices will download automatically on first run (77 MB)
99
+ 4. Select "Uned Technolegau Iaith - Welsh Neural Voices" in NVDA's speech settings
100
+
101
+ ### With Python (ONNX Runtime)
102
+
103
+ ```python
104
+ import onnxruntime as ort
105
+ import numpy as np
106
+ import json
107
+ import wave
108
+
109
+ # Load model
110
+ session = ort.InferenceSession("cy_GB-bu_tts.onnx")
111
+
112
+ # Load config
113
+ with open("cy_GB-bu_tts.onnx.json") as f:
114
+ config = json.load(f)
115
+
116
+ # For complete implementation, refer to:
117
+ # https://github.com/rhasspy/piper/blob/master/src/python_run/piper/voice.py
118
+ ```
119
+
120
+ ### With Sonata Engine
121
+
122
+ ```python
123
+ from sonata import tts_engine
124
+
125
+ engine = tts_engine.TTSEngine()
126
+ engine.load_voice("cy_GB-bu_tts.onnx")
127
+
128
+ # Synthesize speech
129
+ audio = engine.synthesize("Bore da!")
130
+ engine.save_audio(audio, "output.wav")
131
+ ```
132
+
133
+ ## Sample Audio
134
+
135
+ Listen to voice samples at: [Piper Welsh samples](https://rhasspy.github.io/piper-samples/)
136
+
137
+ ## Limitations
138
+
139
+ - **Pronunciation:** May exhibit incorrect or unusual pronunciation for some words, particularly:
140
+ - Technical terms and neologisms
141
+ - Place names not represented in training data
142
+ - Words with ambiguous pronunciation rules
143
+ - **Audio Quality:** Medium quality - suitable for assistive technology but not studio-grade
144
+ - **Domain Coverage:** Best performance on general conversational text; may struggle with specialized domains
145
+ - **Expressivity:** Limited emotional range (neutral/informative tone)
146
+ - **Platform:** Optimized for CPU inference on x86/x64; ARM64 Windows not supported
147
+ - **Language Mixing:** While trained on bilingual data, best results when using pure Welsh text
148
+
149
+ ## Performance
150
+
151
+ - **Real-time Factor:** < 1.0 on modern CPUs (faster than real-time synthesis)
152
+ - **Latency:** Low latency suitable for interactive applications
153
+ - **Memory Usage:** ~100 MB RAM during inference
154
+ - **Supported Platforms:** Windows 10/11 (x86/x64), Linux (x86/x64), Raspberry Pi
155
+
156
+ ## Model Files
157
+
158
+ This repository contains:
159
+ - `cy_GB-bu_tts.onnx` - The neural TTS model in ONNX format
160
+ - `cy_GB-bu_tts.onnx.json` - Model configuration file (phoneme mapping, sample rate, etc.)
161
+
162
+ ## Citation
163
+
164
+ If you use this model, please cite:
165
+
166
+ ```bibtex
167
+ @misc{cy_GB_bu_tts_2025,
168
+ author = {{Language Technologies Unit, Bangor University}},
169
+ title = {cy\_GB-bu\_tts: Welsh Neural Text-to-Speech Model},
170
+ year = {2025},
171
+ publisher = {Hugging Face},
172
+ howpublished = {\url{https://huggingface.co/techiaith/cy_GB-bu_tts}}
173
+ }
174
+
175
+ @dataset{bu_tts_cy_en_2025,
176
+ author = {{Language Technologies Unit, Bangor University}},
177
+ title = {Bangor University Text to Speech Welsh-English Dataset},
178
+ year = {2025},
179
+ publisher = {Hugging Face},
180
+ howpublished = {\url{https://huggingface.co/datasets/techiaith/bu-tts-cy-en}}
181
+ }
182
+
183
+ @misc{piper_tts,
184
+ author = {{Rhasspy Community}},
185
+ title = {Piper: A fast, local neural text to speech system},
186
+ year = {2023},
187
+ publisher = {GitHub},
188
+ howpublished = {\url{https://github.com/rhasspy/piper}},
189
+ note = {Now maintained at \url{https://github.com/OHF-Voice/piper1-gpl}}
190
+ }
191
+ ```
192
+
193
+ ## Acknowledgments
194
+
195
+ This work builds upon contributions from the wider open-source TTS community:
196
+
197
+ - **Piper TTS** and the **Rhasspy community** for developing the training framework and TTS architecture that makes high-quality, local neural TTS accessible
198
+ - **Musharraf Omer** for creating [Sonata TTS engine](https://github.com/mush42/sonata) and the [Sonata-NVDA addon](https://github.com/mush42/sonata-nvda), which enables seamless integration with screen readers
199
+ - Contributors to the Welsh language TTS training data
200
+ - The broader open-source speech synthesis community for advancing accessible voice technology
201
+
202
+ ## License
203
+
204
+ This model is released under **CC0-1.0 (Public Domain)**. You are free to use, modify, and distribute this model for any purpose without restriction.
205
+
206
+ The training code (Piper) is licensed under MIT License.
207
+
208
+ ## Contact & Support
209
+
210
+ **Organization:** Uned Technolegau Iaith / Language Technologies Unit, Bangor University
211
+ **Issues:** Report issues at [GitHub Issues](https://github.com/techiaith/nvda-addon/issues)
212
+ **Project Page:** [NVDA Welsh Neural Voices](https://github.com/techiaith/nvda-addon)
213
+
214
+ ## Version History
215
+
216
+ - **2025.11.0 (Beta):** Initial public release with 3 speaker variants, medium quality
217
+
218
+ ## Related Resources
219
+
220
+ - [NVDA Welsh Neural Voices Addon](https://github.com/techiaith/nvda-addon) - Screen reader integration
221
+ - [Piper TTS](https://github.com/rhasspy/piper) - Training and inference framework
222
+ - [Sonata Engine](https://github.com/mush42/sonata) - Cross-platform TTS engine
223
+ - [Training Dataset](https://huggingface.co/datasets/techiaith/bu-tts-cy-en) - Welsh-English TTS corpus
224
+
225
+ ---
226
+
227
+ *This model was developed to support Welsh language accessibility and to preserve and promote the Welsh language through modern speech technology.*