File size: 4,942 Bytes
2395c1f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 |
---
license: apache-2.0
language:
- eu
tags:
- TTS
- PL-BERT
- WordPiece
- hitz-aholab
---
# PL-BERT-eu
## Overview
<details>
<summary>Click to expand</summary>
- [Model Description](#model-description)
- [Intended Uses and Limitations](#intended-uses-and-limitations)
- [How to Get Started with the Model](#how-to-get-started-with-the-model)
- [Training Details](#training-details)
- [Citation](#citation)
- [Additional information](#additional-information)
</details>
---
## Model Description
**PL-BERT-eu** is a phoneme-level masked language model trained on Basque Wikipedia text. It is based on [PL-BERT architecture](https://github.com/yl4579/PL-BERT) and learns phoneme representations via a masked language modeling objective.
This model supports **phoneme-based text-to-speech (TTS) systems**, such as [StyleTTS2](https://github.com/yl4579/StyleTTS2) using Basque-specific phoneme vocabulary and contextual embeddings.
Features of our PL-BERT:
- It is trained **exclusively on Basque** phonemized Wikipedia text.
- It uses a reduced **phoneme vocabulary of 178 tokens**.
- It utilizes a WordPiece tokenizer for phonemized Basque text.
- It includes a custom `token_maps_eu.pkl` and adapted `util.py`.
---
## Intended Uses and Limitations
### Intended uses
- Integration into phoneme-based TTS pipelines such as StyleTTS2.
- Speech synthesis and phoneme embedding extraction for Basque.
### Limitations
- Not designed for general NLP tasks.
- Only supports Basque phoneme tokens.
---
## How to Get Started with the Model
Here is an example of how to use this model within the StyleTTS2 framework:
1. Clone the StyleTTS2 repository: https://github.com/yl4579/StyleTTS2
2. Inside the `Utils` directory, create a new folder, for example: `PLBERT_eu`.
3. Copy the following files into that folder:
- `config.yml` (training configuration)
- `step_4000000.t7` (trained checkpoint)
- `util.py` (modified to fix position ID loading)
4. In your StyleTTS2 configuration file, update the `PLBERT_dir` entry to:
`PLBERT_dir: Utils/PLBERT_eu`
5. Update the import statement in your code to:
`from Utils.PLBERT_eu.util import load_plbert`
6. We used code developed by [Aholab](https://aholab.ehu.eus/aholab/) to generate IPA phonemes for training the model. You can see a demo of the Basque phonemizer at [arrandi/phonemizer-eus-esp](https://huggingface.co/spaces/arrandi/phonemizer-eus-esp). Likewise, the code used to generate IPA phonemes can be found in the `phonemizer` directory. We collapsed multi-character phonemes into single-character phonemes for better grapheme–phoneme alignment.
**Note:** If second-stage StyleTTS2 training produces a NaN loss when using a single GPU, see [issue #254](https://github.com/yl4579/StyleTTS2/issues/254) in the original StyleTTS2 repository.
---
## Training Details
### Training data
The model was trained on a Basque corpus phonemized using **Modelo1y2**. It uses a consistent phoneme token set with boundary markers and masking tokens.
Tokenizer: custom (splits on whitespace)
Phoneme masking strategy: phoneme-level masking and replacement
Training steps: 4,000,000
Precision: mixed-precision (fp16)
### Training configuration
Model parameters:
- Vocabulary size: 178
- Hidden size: 768
- Attention heads: 12
- Intermediate size: 2048
- Number of layers: 12
- Max position embeddings: 512
- Dropout: 0.1
- Embedding size: 128
- Number of hidden groups: 1
- Number of hidden layers per group: 12
- Inner group number: 1
- Downscale factor: 1
Other parameters:
- Batch size: 32
- Max mel length: 512
- Word mask probability: 0.15
- Phoneme mask probability: 0.1
- Replacement probability: 0.2
- Token separator: space
- Token mask: M
- Word separator ID: 2
- Scheduler type: OneCycleLR
- Learning rate: 0.0002
- pct_start: 0.1
- Annealing strategy: cosine annealing
- div_factor: 25
- final_div_factor: 10000
### Evaluation
The model has been successfully integrated into StyleTTS2, where it enables the synthesis of Basque.
---
## Citation
If this code contributes to your research, please cite the work:
```
@misc{aarriandiagaplberteu,
title={PL-BERT-eu},
author={Ander Arriandiaga and Ibon Saratxaga and Eva Navas and Inma Hernaez},
organization={Hitz (Aholab) - EHU},
url={https://huggingface.co/langtech-veu/PL-BERT-wp_es},
year={2026}
}
```
## Additional Information
### Author
Author: [Ander Arriandiaga](https://huggingface.co/arrandi) — Aholab (Hitz), EHU
### Contact
For further information, please send an email to <inma.hernaez@ehu.eus>.
### Copyright
Copyright(c) 2026 by Aholab, HiTZ.
### License
[Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0)
### Funding
This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project Desarrollo de Modelos ALIA.
|