Fill-Mask
Transformers
Safetensors
Luxembourgish
modernbert
encoder
luxembourgish
multilingual
masked-language-modeling
Instructions to use instilux/ltz-e1-mini with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use instilux/ltz-e1-mini with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="instilux/ltz-e1-mini")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("instilux/ltz-e1-mini") model = AutoModelForMaskedLM.from_pretrained("instilux/ltz-e1-mini") - Notebooks
- Google Colab
- Kaggle
| language: | |
| - lb | |
| license: cc-by-sa-4.0 | |
| library_name: transformers | |
| pipeline_tag: fill-mask | |
| tags: | |
| - modernbert | |
| - encoder | |
| - luxembourgish | |
| - multilingual | |
| - masked-language-modeling | |
| # LTZ E1 (mini) | |
| A ModernBERT-based masked language model pretrained on Luxembourgish, following the Ettin recipe (see here: https://huggingface.co/jhu-clsp/ettin-encoder-68m) | |
| ## Model Details | |
| - **Architecture:** ModernBERT (encoder) | |
| - **Size:** mini | |
| - **Vocabulary:** 50,368 tokens (BPE, GPTNeoXTokenizerFast) | |
| - **Context length:** 1,024 tokens | |
| - **Language:** Luxembourgish (`lb`/`ltz`) | |
| - **License:** CC BY-SA 4.0 | |
| ## Usage | |
| Requires `transformers>=4.48.0`. | |
| ```python | |
| from transformers import AutoModelForMaskedLM, AutoTokenizer | |
| import torch | |
| tokenizer = AutoTokenizer.from_pretrained("instilux/ltz-e1-mini") | |
| model = AutoModelForMaskedLM.from_pretrained("instilux/ltz-e1-mini") | |
| inputs = tokenizer("Wéi spéit [MASK] et?", return_tensors="pt") | |
| mask_pos = (inputs["input_ids"] == tokenizer.mask_token_id).nonzero(as_tuple=True)[1] | |
| with torch.no_grad(): | |
| outputs = model(**inputs) | |
| top_tokens = outputs.logits[0, mask_pos].topk(5) | |
| for token_id, score in zip(top_tokens.indices[0], top_tokens.values[0]): | |
| token = tokenizer.decode(token_id) | |
| print(f"{token:15s} {score:.3f}") | |
| ``` | |
| ## Tokenizer Notes | |
| The tokenizer is BPE-based (`GPTNeoXTokenizerFast`) with BERT-style special tokens (`[CLS]`, `[SEP]`, `[MASK]`, `[PAD]`). A `[CLS]` token is prepended automatically (`add_bos_token: true`). | |
| ## Citation | |
| Please cite this paper (preprint, accepted to ACL 2026 Findings) if you use this model in your work. | |
| @misc{plum2026ltzglueluxembourgishgenerallanguage, | |
| title={ltzGLUE: Luxembourgish General Language Understanding Evaluation}, | |
| author={Alistair Plum and Felicia Körner and Anne-Marie Lutgen and Laura Bernardy and Fred Philippy and Emilia Milano and Nils Rehlinger and Cédric Lothritz and Tharindu Ranasinghe and Barbara Plank and Christoph Purschke}, | |
| year={2026}, | |
| eprint={2604.17976}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.CL}, | |
| url={https://arxiv.org/abs/2604.17976}, | |
| } | |