phonemetransformers
/

ipa-childes-models-tiny

Model card Files Files and versions

ipa-childes-models-tiny / README.md

codebyzeb's picture

Update README.md

4a5765e verified 10 months ago

|

history blame contribute delete

991 Bytes

	---
	datasets:
	- phonemetransformers/IPA-CHILDES
	language:
	- en
	- eu
	- zh
	- da
	- nl
	- hr
	- es
	- et
	- fa
	- fr
	- de
	- hu
	- is
	- id
	- ga
	- it
	- ja
	- ko
	- pt
	- pl
	- qu
	- ro
	- sr
	- sv
	- tr
	- cy
	- 'no'
	---

	# IPA CHILDES Models: Tiny

	Phoneme-based GPT-2 models trained on all 31 sections of the [IPA-CHILDES](https://huggingface.co/datasets/phonemetransformers/IPA-CHILDES) dataset for the paper [BabyLM's First Words: Word Segmentation as a Phonological Probing Task](https://arxiv.org/abs/2504.03338).

	The models have 600k non-embedding parameters and were trained on 100k tokens of their language. They were evaluated for phonological knowledge using the word segmentation task. Check out the paper for more details. Training and analysis scripts can be found [here](https://github.com/codebyzeb/PhonemeTransformers).

	To load a model:
	```python
	from transformers import AutoModel
	farsi_model = AutoModel.from_pretrained('phonemetransformers/ipa-childes-models-tiny', subfolder='Farsi')
	```