Latvian Text Encoders
Collection
8 items • Updated
# Load model directly
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("AiLab-IMCS-UL/lv-deberta-base")
model = AutoModel.from_pretrained("AiLab-IMCS-UL/lv-deberta-base")Latvian DeBERTaV3 encoder model trained with a replaced token detection (RTD) objective, released with the paper "Pretraining and Benchmarking Modern Encoders for Latvian".
For evaluation code and benchmark results, see: https://github.com/LUMII-AILab/latvian-encoders
@inproceedings{znotins-2026-modern_lv_encoders,
title = "Pretraining and Benchmarking Modern Encoders for {L}atvian",
author = "Znotins, Arturs",
booktitle = "Proceedings of the Second Workshop on Language Models for Low-Resource Languages ({LoResLM})",
year = "2026",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2026.loreslm-1.40/",
pages = "461--470"
}
This work was supported by the EU Recovery and Resilience Facility project Language Technology Initiative (2.3.1.1.i.0/1/22/I/CFLA/002).
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="AiLab-IMCS-UL/lv-deberta-base")