Model Description

A lightweight, permissively licensed Transformer-based Grapheme-to-Phoneme (G2P) model for Hungarian text-to-speech (TTS) phonemization.

This model is designed to convert Hungarian text into IPA phoneme sequences — serving as a drop-in replacement for eSpeak-NG.

Key Features

  • Fast and lightweight — Small Transformer model (~2MB checkpoint)
  • 🧠 End-to-end text → phoneme prediction using CTC loss
  • 📱 Fully offline — Runs on mobile and embedded devices
  • 🔄 Drop-in replacement for eSpeak-NG in Piper-style TTS pipelines
  • ⚖️ MIT licensed — Safe for closed-source and commercial apps (no GPL dependencies)

Model Architecture

Parameter Value
Architecture Transformer (2 layers, 4 attention heads)
Hidden Size 128
FFN Hidden Size 640
Dropout 0.1
Max Position Embeddings 320
Vocabulary Size 38 graphemes, 105 phonemes

Training Data

  • Source: Hungarian text phonemized with eSpeak-NG (hu_HU voice)
  • Training samples: 450,000 sentences
  • Validation samples: 25,000 sentences
  • Test samples: 25,000 sentences
  • Max sequence length: 200 characters

Performance

  • Accuracy: ~98.83% match with eSpeak-NG IPA output
  • Epoch: 25/45 trained
  • Validation loss: 0.134

Input/Output Format

Input: Hungarian text (e.g., "A kezében levő lándzsát a töröknek a szügyébe veti.") Output: IPA phonemes (e.g., "ˌɑ kˈɛzeːbɛn lˈɛvøː lˈaːndʒaːt ˌɑ tˈørøknɛk ˌɑ sˈyɟeːbɛ vˈɛti")

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support