instilux
/

ltz-e1-mini

masked-language-modeling

Model card Files Files and versions

ltz-e1-mini / README.md

fkoerner's picture

Update README.md

1ca9802 verified 7 days ago

|

history blame contribute delete

2.09 kB

	---
	language:
	- lb
	license: cc-by-sa-4.0
	library_name: transformers
	pipeline_tag: fill-mask
	tags:
	- modernbert
	- encoder
	- luxembourgish
	- multilingual
	- masked-language-modeling
	---

	# LTZ E1 (mini)

	A ModernBERT-based masked language model pretrained on Luxembourgish, following the Ettin recipe (see here: https://huggingface.co/jhu-clsp/ettin-encoder-68m)

	## Model Details

	- Architecture: ModernBERT (encoder)
	- Size: mini
	- Vocabulary: 50,368 tokens (BPE, GPTNeoXTokenizerFast)
	- Context length: 1,024 tokens
	- Language: Luxembourgish (`lb`/`ltz`)
	- License: CC BY-SA 4.0

	## Usage

	Requires `transformers>=4.48.0`.

	```python
	from transformers import AutoModelForMaskedLM, AutoTokenizer
	import torch

	tokenizer = AutoTokenizer.from_pretrained("instilux/ltz-e1-mini")
	model = AutoModelForMaskedLM.from_pretrained("instilux/ltz-e1-mini")

	inputs = tokenizer("Wéi spéit [MASK] et?", return_tensors="pt")
	mask_pos = (inputs["input_ids"] == tokenizer.mask_token_id).nonzero(as_tuple=True)[1]

	with torch.no_grad():
	outputs = model(**inputs)

	top_tokens = outputs.logits[0, mask_pos].topk(5)
	for token_id, score in zip(top_tokens.indices[0], top_tokens.values[0]):
	token = tokenizer.decode(token_id)
	print(f"{token:15s} {score:.3f}")
	```

	## Tokenizer Notes

	The tokenizer is BPE-based (`GPTNeoXTokenizerFast`) with BERT-style special tokens (`[CLS]`, `[SEP]`, `[MASK]`, `[PAD]`). A `[CLS]` token is prepended automatically (`add_bos_token: true`).

	## Citation

	Please cite this paper (preprint, accepted to ACL 2026 Findings) if you use this model in your work.

	@misc{plum2026ltzglueluxembourgishgenerallanguage,
	title={ltzGLUE: Luxembourgish General Language Understanding Evaluation},
	author={Alistair Plum and Felicia Körner and Anne-Marie Lutgen and Laura Bernardy and Fred Philippy and Emilia Milano and Nils Rehlinger and Cédric Lothritz and Tharindu Ranasinghe and Barbara Plank and Christoph Purschke},
	year={2026},
	eprint={2604.17976},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2604.17976},
	}