This model is a ModernBert model aligned to Polish using Trans-tokenization technique on 24GB paraller sentences corpus from OpenSubtitles.

To be honest I have no idea if it works.

If it does please cite:

@misc {aleksander_obuchowski_2025,
    author       = { {Aleksander Obuchowski} },
    title        = { ModernBERT-pl (Revision 477647b) },
    year         = 2025,
    url          = { https://huggingface.co/AleksanderObuchowski/ModernBERT-pl },
    doi          = { 10.57967/hf/5052 },
    publisher    = { Hugging Face }
}

Downloads last month: 3

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for AleksanderObuchowski/ModernBERT-pl

Base model

answerdotai/ModernBERT-base

Finetuned

(1347)

this model

Paper for AleksanderObuchowski/ModernBERT-pl

Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP

Paper • 2408.04303 • Published Aug 8, 2024 • 22