phonemetransformers/IPA-CHILDES
Viewer • Updated • 12.5M • 320 • 6
Tokenizers for each language in IPA-CHILDES used to train cross-lingual phoneme LLMs in our papers:
Scripts for creating the tokenizers can be found here. Scripts for training models using these tokenizers can be found here.
To load a tokenizer:
from transformers import AutoTokenizer
dutch_tokenizer = AutoTokenizer.from_pretrained('phonemetransformers/ipa-childes-tokenizers', subfolder='Dutch')