L2 Finnish model

Introduction

L2 Finnish model is a classification model trained (finetuned) with data (ICLFI, LAS2, CEFLING, and TOPLING) containing fictional and non-fictional texts written by Finnish as a second language (L2) speakers. With the model you can classify texts into the following CEFR classes: A1, A2, B1, B2, and C.

Using the model

See the jupyter notebook file (.ipynb) in the Files and versions page for a tutorial.

Data preprocessing scripts

The preprocessing scripts used for ICLFI, LAS2, CEFLING, and TOPLING have been released in Github: https://github.com/idatoivanen/finnish_cefr_preprocessing

References

More information about training data and model training can be found in the paper referenced below (or here).

APA:

Tarvainen, J., Toivanen, I., & Huhta, A. (2025). Automatic language proficiency assessment of written texts: Training a CEFR classifier in L2 Finnish. Studies in Language Assessment, 14(2), 58-90. https://doi.org/10.58379/YWAV5140

BibTeX:

@article{tarvainen2025automatic,
year = {2025},
author = {Tarvainen, Jenny and Toivanen, Ida and Huhta, Ari},
title = {Automatic language proficiency assessment of written texts: Training a CEFR classifier in L2 Finnish},
journal = {Studies in Language Assessment},
volume = {14},
issue = {2},
pages = {58-90},
doi = {10.58379/YWAV5140},
url={https://doi.org/10.58379/YWAV5140}
}

This repository has been produced as part of the FIN-CLARIAH infrastructure project.

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support