Update README.md
Browse files
README.md
CHANGED
|
@@ -33,7 +33,9 @@ language:
|
|
| 33 |
|
| 34 |
# CHILDES IPA Tokenizers
|
| 35 |
|
| 36 |
-
Tokenizers for each language in [IPA-CHILDES](https://huggingface.co/datasets/phonemetransformers/IPA-CHILDES) used to train cross-lingual phoneme LLMs in our
|
|
|
|
|
|
|
| 37 |
|
| 38 |
Scripts for creating the tokenizers can be found [here](https://github.com/codebyzeb/childes-processor).
|
| 39 |
Scripts for training models using these tokenizers can be found [here](https://github.com/codebyzeb/PhonemeTransformers).
|
|
|
|
| 33 |
|
| 34 |
# CHILDES IPA Tokenizers
|
| 35 |
|
| 36 |
+
Tokenizers for each language in [IPA-CHILDES](https://huggingface.co/datasets/phonemetransformers/IPA-CHILDES) used to train cross-lingual phoneme LLMs in our papers:
|
| 37 |
+
- [IPA-CHILDES & G2P+: Feature-Rich Resources for Cross-Lingual Phonology and Phonemic Language Modeling](https://arxiv.org/abs/2504.03036)
|
| 38 |
+
- [BabyLM's First Words: Word Segmentation as a Phonological Probing Task](https://arxiv.org/abs/2504.03338)
|
| 39 |
|
| 40 |
Scripts for creating the tokenizers can be found [here](https://github.com/codebyzeb/childes-processor).
|
| 41 |
Scripts for training models using these tokenizers can be found [here](https://github.com/codebyzeb/PhonemeTransformers).
|