phonemetransformers
/

ipa-childes-tokenizers

Model card Files Files and versions

codebyzeb commited on Apr 8, 2025

Commit

4172d28

·

verified ·

1 Parent(s): 447595a

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -33,7 +33,9 @@ language:
 # CHILDES IPA Tokenizers
-Tokenizers for each language in [IPA-CHILDES](https://huggingface.co/datasets/phonemetransformers/IPA-CHILDES) used to train cross-lingual phoneme LLMs in our paper [IPA-CHILDES & G2P+: Feature-Rich Resources for Cross-Lingual Phonology and Phonemic Language Modeling](https://arxiv.org/abs/2504.03036).
 Scripts for creating the tokenizers can be found [here](https://github.com/codebyzeb/childes-processor).
 Scripts for training models using these tokenizers can be found [here](https://github.com/codebyzeb/PhonemeTransformers).

 # CHILDES IPA Tokenizers
+Tokenizers for each language in [IPA-CHILDES](https://huggingface.co/datasets/phonemetransformers/IPA-CHILDES) used to train cross-lingual phoneme LLMs in our papers:
+- [IPA-CHILDES & G2P+: Feature-Rich Resources for Cross-Lingual Phonology and Phonemic Language Modeling](https://arxiv.org/abs/2504.03036)
+- [BabyLM's First Words: Word Segmentation as a Phonological Probing Task](https://arxiv.org/abs/2504.03338)
 Scripts for creating the tokenizers can be found [here](https://github.com/codebyzeb/childes-processor).
 Scripts for training models using these tokenizers can be found [here](https://github.com/codebyzeb/PhonemeTransformers).