Model Description

A lightweight, permissively licensed Transformer-based Grapheme-to-Phoneme (G2P) model for Hungarian text-to-speech (TTS) phonemization.

This model is designed to convert Hungarian text into IPA phoneme sequences — serving as a drop-in replacement for eSpeak-NG.

Key Features

⚡ Fast and lightweight — Small Transformer model (~2MB checkpoint)
🧠 End-to-end text → phoneme prediction using CTC loss
📱 Fully offline — Runs on mobile and embedded devices
🔄 Drop-in replacement for eSpeak-NG in Piper-style TTS pipelines
⚖️ MIT licensed — Safe for closed-source and commercial apps (no GPL dependencies)

Model Architecture

Parameter	Value
Architecture	Transformer (2 layers, 4 attention heads)
Hidden Size	128
FFN Hidden Size	640
Dropout	0.1
Max Position Embeddings	320
Vocabulary Size	38 graphemes, 105 phonemes

Training Data

Source: Hungarian text phonemized with eSpeak-NG (hu_HU voice)
Training samples: 450,000 sentences
Validation samples: 25,000 sentences
Test samples: 25,000 sentences
Max sequence length: 200 characters

Performance

Accuracy: ~98.83% match with eSpeak-NG IPA output
Epoch: 25/45 trained
Validation loss: 0.134

Input/Output Format

Input: Hungarian text (e.g., "A kezében levő lándzsát a töröknek a szügyébe veti.") Output: IPA phonemes (e.g., "ˌɑ kˈɛzeːbɛn lˈɛvøː lˈaːndʒaːt ˌɑ tˈørøknɛk ˌɑ sˈyɟeːbɛ vˈɛti")

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support