Translationese English Goldfish-style models, with training data machine-translated from different source languages
Jenny Kunz
jekunz
AI & ML interests
Explainability and interpretability of NLP models, language adaptation, PEFT methods
Recent Activity
updated a dataset 11 days ago
liu-nlp/unimorph-blimp-growup updated a dataset 11 days ago
liu-nlp/unimorph-blimp-200 published a dataset 11 days ago
liu-nlp/unimorph-blimp-growupOrganizations
Adaptation of SmolLM to Faroese
All datasets and models created for the paper "Family Matters: Language Transfer and Merging for Adapting Small LLMs to Faroese".
SmolLM CPT LoRA
Idiomatic Language Acquisition
Models associated with the Paper "Preferences for Idiomatic Language are Acquired Slowly -- and Forgotten Quickly: A Case Study on Swedish"
-
jekunz/smollm-135m-cpt-fineweb-swedish-smol-smoltalk
Text Generation • 0.1B • Updated • 7 -
jekunz/smollm-135m-fineweb-swedish-from-scratch-smol-smoltalk
Text Generation • 0.1B • Updated • 9 -
jekunz/smollm-135m-cpt-fineweb-swedish
Text Generation • 0.1B • Updated • 6 -
jekunz/smollm-135m-fineweb-swedish-from-scratch
Text Generation • 0.1B • Updated • 8
SmolLM baselines trained from scratch
SmolLM CPT
Continued Pre-Training of SmolLM models on the Fineweb-2 portions of Scandinavian languages.
-
jekunz/smollm-135m-cpt-fineweb-faroese
Text Generation • 0.1B • Updated • 1 -
jekunz/smollm-135m-cpt-fineweb-icelandic
Text Generation • 0.1B • Updated • 2 -
jekunz/smollm-135m-cpt-fineweb-swedish
Text Generation • 0.1B • Updated • 6 -
jekunz/smollm-135m-cpt-fineweb-faroese-transfer-from-icelandic
Text Generation • 0.1B • Updated • 2
Translationese English Models
Translationese English Goldfish-style models, with training data machine-translated from different source languages
Idiomatic Language Acquisition
Models associated with the Paper "Preferences for Idiomatic Language are Acquired Slowly -- and Forgotten Quickly: A Case Study on Swedish"
-
jekunz/smollm-135m-cpt-fineweb-swedish-smol-smoltalk
Text Generation • 0.1B • Updated • 7 -
jekunz/smollm-135m-fineweb-swedish-from-scratch-smol-smoltalk
Text Generation • 0.1B • Updated • 9 -
jekunz/smollm-135m-cpt-fineweb-swedish
Text Generation • 0.1B • Updated • 6 -
jekunz/smollm-135m-fineweb-swedish-from-scratch
Text Generation • 0.1B • Updated • 8
Adaptation of SmolLM to Faroese
All datasets and models created for the paper "Family Matters: Language Transfer and Merging for Adapting Small LLMs to Faroese".
SmolLM baselines trained from scratch
SmolLM CPT LoRA
SmolLM CPT
Continued Pre-Training of SmolLM models on the Fineweb-2 portions of Scandinavian languages.
-
jekunz/smollm-135m-cpt-fineweb-faroese
Text Generation • 0.1B • Updated • 1 -
jekunz/smollm-135m-cpt-fineweb-icelandic
Text Generation • 0.1B • Updated • 2 -
jekunz/smollm-135m-cpt-fineweb-swedish
Text Generation • 0.1B • Updated • 6 -
jekunz/smollm-135m-cpt-fineweb-faroese-transfer-from-icelandic
Text Generation • 0.1B • Updated • 2