Fill-Mask
Transformers
Safetensors
Luxembourgish
modernbert
encoder
luxembourgish
multilingual
masked-language-modeling
Instructions to use instilux/ltz-e1-mini with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use instilux/ltz-e1-mini with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="instilux/ltz-e1-mini")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("instilux/ltz-e1-mini") model = AutoModelForMaskedLM.from_pretrained("instilux/ltz-e1-mini") - Notebooks
- Google Colab
- Kaggle
File size: 2,093 Bytes
119f490 db5e273 119f490 1ca9802 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | ---
language:
- lb
license: cc-by-sa-4.0
library_name: transformers
pipeline_tag: fill-mask
tags:
- modernbert
- encoder
- luxembourgish
- multilingual
- masked-language-modeling
---
# LTZ E1 (mini)
A ModernBERT-based masked language model pretrained on Luxembourgish, following the Ettin recipe (see here: https://huggingface.co/jhu-clsp/ettin-encoder-68m)
## Model Details
- **Architecture:** ModernBERT (encoder)
- **Size:** mini
- **Vocabulary:** 50,368 tokens (BPE, GPTNeoXTokenizerFast)
- **Context length:** 1,024 tokens
- **Language:** Luxembourgish (`lb`/`ltz`)
- **License:** CC BY-SA 4.0
## Usage
Requires `transformers>=4.48.0`.
```python
from transformers import AutoModelForMaskedLM, AutoTokenizer
import torch
tokenizer = AutoTokenizer.from_pretrained("instilux/ltz-e1-mini")
model = AutoModelForMaskedLM.from_pretrained("instilux/ltz-e1-mini")
inputs = tokenizer("Wéi spéit [MASK] et?", return_tensors="pt")
mask_pos = (inputs["input_ids"] == tokenizer.mask_token_id).nonzero(as_tuple=True)[1]
with torch.no_grad():
outputs = model(**inputs)
top_tokens = outputs.logits[0, mask_pos].topk(5)
for token_id, score in zip(top_tokens.indices[0], top_tokens.values[0]):
token = tokenizer.decode(token_id)
print(f"{token:15s} {score:.3f}")
```
## Tokenizer Notes
The tokenizer is BPE-based (`GPTNeoXTokenizerFast`) with BERT-style special tokens (`[CLS]`, `[SEP]`, `[MASK]`, `[PAD]`). A `[CLS]` token is prepended automatically (`add_bos_token: true`).
## Citation
Please cite this paper (preprint, accepted to ACL 2026 Findings) if you use this model in your work.
@misc{plum2026ltzglueluxembourgishgenerallanguage,
title={ltzGLUE: Luxembourgish General Language Understanding Evaluation},
author={Alistair Plum and Felicia Körner and Anne-Marie Lutgen and Laura Bernardy and Fred Philippy and Emilia Milano and Nils Rehlinger and Cédric Lothritz and Tharindu Ranasinghe and Barbara Plank and Christoph Purschke},
year={2026},
eprint={2604.17976},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2604.17976},
}
|