beshkenadze's picture
Upload README.md with huggingface_hub
35f0379 verified
metadata
license: mit
language:
  - multilingual
tags:
  - g2p
  - grapheme-to-phoneme
  - byt5
  - mlx
  - apple-silicon
  - ipa
  - phonetics
library_name: mlx
base_model: charsiu/g2p_multilingual_byT5_tiny_16_layers_100
pipeline_tag: text-generation

g2p-multilingual-byT5-tiny-mlx

MLX-compatible weights for charsiu/g2p_multilingual_byT5_tiny_16_layers_100 — a ByT5-based multilingual grapheme-to-phoneme converter supporting 100 languages.

Converted from PyTorch to SafeTensors format for use with mlx-swift and MLX.

Model Details

Property Value
Architecture T5ForConditionalGeneration (ByT5)
Parameters 20.8M
Encoder layers 12
Decoder layers 4
d_model 256
d_ff 1024
Vocab size 384 (byte-level)
Format SafeTensors (float32)
Size ~83 MB

Performance

Metric Score
PER (Phoneme Error Rate) 0.096
WER (Word Error Rate) 0.281

Usage with CharsiuG2PKit (Swift)

import CharsiuG2PKit

let g2p = try G2P(modelDirectory: modelURL)
let ipa = g2p.convert("hello", language: "eng-us")
// "ˈhɛɫoʊ"

See CharsiuG2PKit for full documentation.

Usage with Python MLX

import mlx.core as mx
from mlx.utils import tree_unflatten
from safetensors import safe_open

tensors = {}
with safe_open("model.safetensors", framework="numpy") as f:
    for key in f.keys():
        tensors[key] = mx.array(f.get_tensor(key))

Conversion

Converted from pytorch_model.bin using:

uv run scripts/convert_weights.py --model charsiu/g2p_multilingual_byT5_tiny_16_layers_100

Shared tensor aliases (encoder.embed_tokens.weight, decoder.embed_tokens.weight) are deduplicated — only shared.weight is kept.

Citation

@inproceedings{zhu2022byt5,
  title={ByT5 model for massively multilingual grapheme-to-phoneme conversion},
  author={Zhu, Jian and Zhang, Cong and Jurgens, David},
  booktitle={Interspeech},
  year={2022}
}

License

MIT