Update README.md

Browse files

Files changed (1) hide show

README.md +227 -3

README.md CHANGED Viewed

@@ -1,3 +1,227 @@
----
-license: cc0-1.0
----

+---
+language:
+- cy
+- en
+license: cc0-1.0
+library_name: piper-tts
+tags:
+- text-to-speech
+- tts
+- welsh
+- cymraeg
+- audio
+- onnx
+- piper
+- accessibility
+- assistive-technology
+- screen-reader
+datasets:
+- techiaith/bu-tts-cy-en
+model-index:
+- name: cy_GB-bu_tts
+  results: []
+---
+# cy_GB-bu_tts - Welsh Neural Text-to-Speech
+This is a Welsh (Cymraeg) neural text-to-speech model trained using [Piper](https://github.com/rhasspy/piper), a fast, local neural TTS system optimized for Raspberry Pi and other low-end devices.
+**Developed by:** Uned Technolegau Iaith (Language Technologies Unit), Bangor University
+**Model type:** Neural TTS (VITS-based architecture)
+**Language:** Welsh (cy_GB)
+**License:** CC0-1.0
+**Format:** ONNX
+## Model Details
+- **Architecture:** Based on Piper's VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech)
+- **Speakers:** Multi-speaker model with 3 speaker variants
+- **Quality:** Medium quality (suitable for screen readers and assistive technology)
+- **Model Size:** Approximately 77 MB
+- **Inference Speed:** Optimized for real-time synthesis on CPU
+- **Sample Rate:** 22050 Hz
+- **Training Framework:** [Piper training pipeline](https://github.com/rhasspy/piper)
+## Training Data
+This model was trained on the [bu-tts-cy-en dataset](https://huggingface.co/datasets/techiaith/bu-tts-cy-en) (Bangor University Text to Speech Welsh-English dataset).
+**Dataset characteristics:**
+- **Size:** 10,000-100,000 samples
+- **Languages:** Welsh and English (bilingual dataset)
+- **License:** CC0 1.0 (Public Domain)
+- **Content:** Audio recordings with corresponding text transcriptions
+- **Source:** Language Technologies Unit, Bangor University
+**Training data limitations:**
+- Dataset consists of freely available recordings (public domain audiobooks and research-quality recordings)
+- Coverage is not comprehensive across all Welsh vocabulary and contexts
+- Some pronunciation patterns may be influenced by the limited speaker diversity in the training data
+- Quality improvements would be possible with larger, more diverse, professionally-recorded datasets
+## Intended Use
+**Primary use cases:**
+- Screen readers and assistive technology (particularly [NVDA integration](https://github.com/techiaith/nvda-addon))
+- Accessibility tools for Welsh speakers with visual impairments
+- Welsh language learning applications
+- Local, offline Welsh TTS applications
+- Research in Welsh speech synthesis
+**Supported platforms:**
+- Compatible with Piper TTS runtime
+- Works with [Sonata TTS engine](https://github.com/mush42/sonata)
+- ONNX Runtime on x86/x64 architectures
+- Raspberry Pi and other resource-constrained devices
+## Usage
+### With Piper
+```bash
+# Download model files
+wget https://huggingface.co/techiaith/cy_GB-bu_tts/resolve/main/cy_GB-bu_tts.onnx
+wget https://huggingface.co/techiaith/cy_GB-bu_tts/resolve/main/cy_GB-bu_tts.onnx.json
+# Run synthesis
+echo "Bore da, sut wyt ti?" | piper \
+  --model cy_GB-bu_tts.onnx \
+  --output_file output.wav
+```
+### With NVDA Screen Reader
+Install the [techiaith Welsh Neural Voices addon for NVDA](https://github.com/techiaith/nvda-addon):
+1. Download the addon from the [releases page](https://github.com/techiaith/nvda-addon/releases/latest)
+2. Install and restart NVDA
+3. Voices will download automatically on first run (77 MB)
+4. Select "Uned Technolegau Iaith - Welsh Neural Voices" in NVDA's speech settings
+### With Python (ONNX Runtime)
+```python
+import onnxruntime as ort
+import numpy as np
+import json
+import wave
+# Load model
+session = ort.InferenceSession("cy_GB-bu_tts.onnx")
+# Load config
+with open("cy_GB-bu_tts.onnx.json") as f:
+    config = json.load(f)
+# For complete implementation, refer to:
+# https://github.com/rhasspy/piper/blob/master/src/python_run/piper/voice.py
+```
+### With Sonata Engine
+```python
+from sonata import tts_engine
+engine = tts_engine.TTSEngine()
+engine.load_voice("cy_GB-bu_tts.onnx")
+# Synthesize speech
+audio = engine.synthesize("Bore da!")
+engine.save_audio(audio, "output.wav")
+```
+## Sample Audio
+Listen to voice samples at: [Piper Welsh samples](https://rhasspy.github.io/piper-samples/)
+## Limitations
+- **Pronunciation:** May exhibit incorrect or unusual pronunciation for some words, particularly:
+  - Technical terms and neologisms
+  - Place names not represented in training data
+  - Words with ambiguous pronunciation rules
+- **Audio Quality:** Medium quality - suitable for assistive technology but not studio-grade
+- **Domain Coverage:** Best performance on general conversational text; may struggle with specialized domains
+- **Expressivity:** Limited emotional range (neutral/informative tone)
+- **Platform:** Optimized for CPU inference on x86/x64; ARM64 Windows not supported
+- **Language Mixing:** While trained on bilingual data, best results when using pure Welsh text
+## Performance
+- **Real-time Factor:** < 1.0 on modern CPUs (faster than real-time synthesis)
+- **Latency:** Low latency suitable for interactive applications
+- **Memory Usage:** ~100 MB RAM during inference
+- **Supported Platforms:** Windows 10/11 (x86/x64), Linux (x86/x64), Raspberry Pi
+## Model Files
+This repository contains:
+- `cy_GB-bu_tts.onnx` - The neural TTS model in ONNX format
+- `cy_GB-bu_tts.onnx.json` - Model configuration file (phoneme mapping, sample rate, etc.)
+## Citation
+If you use this model, please cite:
+```bibtex
+@misc{cy_GB_bu_tts_2025,
+  author = {{Language Technologies Unit, Bangor University}},
+  title = {cy\_GB-bu\_tts: Welsh Neural Text-to-Speech Model},
+  year = {2025},
+  publisher = {Hugging Face},
+  howpublished = {\url{https://huggingface.co/techiaith/cy_GB-bu_tts}}
+}
+@dataset{bu_tts_cy_en_2025,
+  author = {{Language Technologies Unit, Bangor University}},
+  title = {Bangor University Text to Speech Welsh-English Dataset},
+  year = {2025},
+  publisher = {Hugging Face},
+  howpublished = {\url{https://huggingface.co/datasets/techiaith/bu-tts-cy-en}}
+}
+@misc{piper_tts,
+  author = {{Rhasspy Community}},
+  title = {Piper: A fast, local neural text to speech system},
+  year = {2023},
+  publisher = {GitHub},
+  howpublished = {\url{https://github.com/rhasspy/piper}},
+  note = {Now maintained at \url{https://github.com/OHF-Voice/piper1-gpl}}
+}
+```
+## Acknowledgments
+This work builds upon contributions from the wider open-source TTS community:
+- **Piper TTS** and the **Rhasspy community** for developing the training framework and TTS architecture that makes high-quality, local neural TTS accessible
+- **Musharraf Omer** for creating [Sonata TTS engine](https://github.com/mush42/sonata) and the [Sonata-NVDA addon](https://github.com/mush42/sonata-nvda), which enables seamless integration with screen readers
+- Contributors to the Welsh language TTS training data
+- The broader open-source speech synthesis community for advancing accessible voice technology
+## License
+This model is released under **CC0-1.0 (Public Domain)**. You are free to use, modify, and distribute this model for any purpose without restriction.
+The training code (Piper) is licensed under MIT License.
+## Contact & Support
+**Organization:** Uned Technolegau Iaith / Language Technologies Unit, Bangor University
+**Issues:** Report issues at [GitHub Issues](https://github.com/techiaith/nvda-addon/issues)
+**Project Page:** [NVDA Welsh Neural Voices](https://github.com/techiaith/nvda-addon)
+## Version History
+- **2025.11.0 (Beta):** Initial public release with 3 speaker variants, medium quality
+## Related Resources
+- [NVDA Welsh Neural Voices Addon](https://github.com/techiaith/nvda-addon) - Screen reader integration
+- [Piper TTS](https://github.com/rhasspy/piper) - Training and inference framework
+- [Sonata Engine](https://github.com/mush42/sonata) - Cross-platform TTS engine
+- [Training Dataset](https://huggingface.co/datasets/techiaith/bu-tts-cy-en) - Welsh-English TTS corpus
+---
+*This model was developed to support Welsh language accessibility and to preserve and promote the Welsh language through modern speech technology.*