BPEAlbert — Polish Phonetic ALBERT Encoder

BPEAlbert, is a phonetic ALBERT-style encoder trained on phonetic word representations from approximately 100 million Polish sentences.

The model was designed primarily as a lightweight linguistic encoder for Polish text-to-speech systems, where phonetic structure, word-level context, and compact hidden representations are useful for downstream acoustic or prosody modeling.

Model Overview

BPEAlbert is trained on phonetic/BPE-like tokenized Polish text rather than raw orthographic text.
Instead of modeling characters or standard subwords directly, the model learns contextual representations over phonetic word sequences, making it especially suitable for Polish TTS pipelines.

The model can be used as:

a phonetic text encoder for TTS,
a contextual representation model for Polish phoneme sequences,
a pretrained backbone for speech-related NLP tasks,
an encoder module in custom neural speech synthesis systems.

Training Objective

The model was pretrained with masking-based objectives over phonetic word/token sequences.

Current approximate training losses:

Loss type	Value
Vocabulary loss	~0.6
Token loss	~1.5

These values are provided as rough training references and may depend on the exact checkpoint and evaluation setup.

Architecture

BPEAlbert follows an ALBERT-style Transformer encoder configuration.

Main configuration:

Parameter	Value
Hidden size	768
Attention heads	12
Hidden layers	12
Intermediate size	2048
Max sequence length	512
Vocabulary size	178
Dropout	0.1

Intended Use

This model is intended to be used as an encoder inside Polish TTS systems.

A typical use case is:

Convert Polish text into phonetic word/token representation.
Tokenize the phonetic sequence using the expected token map.
Pass the sequence through BPEAlbert.
Use the resulting hidden states as linguistic conditioning for a TTS model.

The encoder returns contextual hidden representations that can be consumed by downstream acoustic, duration, prosody, or style models.

Loading the Model

This repository includes a helper utility for loading the checkpoint into a custom ALBERT model.

from util import load_plbert

model = load_plbert("path/to/checkpoint_directory")
model.eval()

Downloads last month: -; Downloads are not tracked for this model. How to track