BPEAlbert โ Polish Phonetic ALBERT Encoder
BPEAlbert, is a phonetic ALBERT-style encoder trained on phonetic word representations from approximately 100 million Polish sentences.
The model was designed primarily as a lightweight linguistic encoder for Polish text-to-speech systems, where phonetic structure, word-level context, and compact hidden representations are useful for downstream acoustic or prosody modeling.
Model Overview
BPEAlbert is trained on phonetic/BPE-like tokenized Polish text rather than raw orthographic text.
Instead of modeling characters or standard subwords directly, the model learns contextual representations over phonetic word sequences, making it especially suitable for Polish TTS pipelines.
The model can be used as:
- a phonetic text encoder for TTS,
- a contextual representation model for Polish phoneme sequences,
- a pretrained backbone for speech-related NLP tasks,
- an encoder module in custom neural speech synthesis systems.
Training Objective
The model was pretrained with masking-based objectives over phonetic word/token sequences.
Current approximate training losses:
| Loss type | Value |
|---|---|
| Vocabulary loss | ~0.6 |
| Token loss | ~1.5 |
These values are provided as rough training references and may depend on the exact checkpoint and evaluation setup.
Architecture
BPEAlbert follows an ALBERT-style Transformer encoder configuration.
Main configuration:
| Parameter | Value |
|---|---|
| Hidden size | 768 |
| Attention heads | 12 |
| Hidden layers | 12 |
| Intermediate size | 2048 |
| Max sequence length | 512 |
| Vocabulary size | 178 |
| Dropout | 0.1 |
Intended Use
This model is intended to be used as an encoder inside Polish TTS systems.
A typical use case is:
- Convert Polish text into phonetic word/token representation.
- Tokenize the phonetic sequence using the expected token map.
- Pass the sequence through BPEAlbert.
- Use the resulting hidden states as linguistic conditioning for a TTS model.
The encoder returns contextual hidden representations that can be consumed by downstream acoustic, duration, prosody, or style models.
Loading the Model
This repository includes a helper utility for loading the checkpoint into a custom ALBERT model.
from util import load_plbert
model = load_plbert("path/to/checkpoint_directory")
model.eval()