Models with CPT and instruction residuals (chat vectors) applied, and models instruction-tuned on machine-translated data, and combinations of both.
Jenny Kunz
jekunz
AI & ML interests
Explainability and interpretability of NLP models, language adaptation, PEFT methods
Recent Activity
updated a dataset 4 days ago
jekunz/magpie-idiomaticity-eval published a dataset 4 days ago
jekunz/magpie-idiomaticity-eval published a dataset 17 days ago
liu-nlp/smol-smoltalk-swedishOrganizations
Idiomatic Language Acquisition
Models associated with the Paper "Preferences for Idiomatic Language are Acquired Slowly -- and Forgotten Quickly: A Case Study on Swedish"
-
jekunz/smollm-135m-cpt-fineweb-swedish-smol-smoltalk
Text Generation • 0.1B • Updated • 28 -
jekunz/smollm-135m-fineweb-swedish-from-scratch-smol-smoltalk
Text Generation • 0.1B • Updated • 4 -
jekunz/smollm-135m-cpt-fineweb-swedish
Text Generation • 0.1B • Updated • 3 -
jekunz/smollm-135m-fineweb-swedish-from-scratch
Text Generation • 0.1B • Updated • 11
SmolLM baselines trained from scratch
SmolLM CPT
Continued Pre-Training of SmolLM models on the Fineweb-2 portions of Scandinavian languages.
-
jekunz/smollm-135m-cpt-fineweb-faroese
Text Generation • 0.1B • Updated • 4 -
jekunz/smollm-135m-cpt-fineweb-icelandic
Text Generation • 0.1B • Updated • 4 -
jekunz/smollm-135m-cpt-fineweb-swedish
Text Generation • 0.1B • Updated • 3 -
jekunz/smollm-135m-cpt-fineweb-faroese-transfer-from-icelandic
Text Generation • 0.1B • Updated • 5
Translationese English Models
Translationese English Goldfish-style models, with training data machine-translated from different source languages
Adaptation of SmolLM to Faroese
All datasets and models created for the paper "Family Matters: Language Transfer and Merging for Adapting Small LLMs to Faroese".
SmolLM CPT LoRA
Instruction Residuals
Models with CPT and instruction residuals (chat vectors) applied, and models instruction-tuned on machine-translated data, and combinations of both.
Translationese English Models
Translationese English Goldfish-style models, with training data machine-translated from different source languages
Idiomatic Language Acquisition
Models associated with the Paper "Preferences for Idiomatic Language are Acquired Slowly -- and Forgotten Quickly: A Case Study on Swedish"
-
jekunz/smollm-135m-cpt-fineweb-swedish-smol-smoltalk
Text Generation • 0.1B • Updated • 28 -
jekunz/smollm-135m-fineweb-swedish-from-scratch-smol-smoltalk
Text Generation • 0.1B • Updated • 4 -
jekunz/smollm-135m-cpt-fineweb-swedish
Text Generation • 0.1B • Updated • 3 -
jekunz/smollm-135m-fineweb-swedish-from-scratch
Text Generation • 0.1B • Updated • 11
Adaptation of SmolLM to Faroese
All datasets and models created for the paper "Family Matters: Language Transfer and Merging for Adapting Small LLMs to Faroese".
SmolLM baselines trained from scratch
SmolLM CPT LoRA
SmolLM CPT
Continued Pre-Training of SmolLM models on the Fineweb-2 portions of Scandinavian languages.
-
jekunz/smollm-135m-cpt-fineweb-faroese
Text Generation • 0.1B • Updated • 4 -
jekunz/smollm-135m-cpt-fineweb-icelandic
Text Generation • 0.1B • Updated • 4 -
jekunz/smollm-135m-cpt-fineweb-swedish
Text Generation • 0.1B • Updated • 3 -
jekunz/smollm-135m-cpt-fineweb-faroese-transfer-from-icelandic
Text Generation • 0.1B • Updated • 5