Instructions to use ProMeText/aquilign-multilingual-segmenter with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ProMeText/aquilign-multilingual-segmenter with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="ProMeText/aquilign-multilingual-segmenter")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("ProMeText/aquilign-multilingual-segmenter") model = AutoModelForTokenClassification.from_pretrained("ProMeText/aquilign-multilingual-segmenter") - Notebooks
- Google Colab
- Kaggle
File size: 2,719 Bytes
e0ac941 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 | ---
license: cc-by-nc-sa-4.0
language:
- la
- fr
- en
- pt
- ca
- es
- it
pipeline_tag: token-classification
library_name: transformers
tags:
- medieval-texts
- phrase-segmentation
- multilingual
---
# Aquilign Multilingual Segmenter
**Aquilign Multilingual Segmenter** is a token-classification model for phrase-level segmentation of medieval and historical texts.
The model is designed to detect custom segmentation delimiters in multilingual historical corpora and is used as part of the [Aquilign](https://github.com/ProMeText/Aquilign) alignment workflow.
## Model Description
The segmenter is based on a trainable `BertForTokenClassification` model from Hugging Face’s `transformers` library.
It was fine-tuned on historical prose from the [Multilingual Segmentation Dataset](https://github.com/ProMeText/multilingual-segmentation-dataset) to identify phrase-level segmentation boundaries.
## Supported Languages
- Latin
- French
- Castilian
- Portuguese
- Catalan
- English
- Italian
## Intended Use
This model is intended for:
- phrase-level segmentation of **medieval texts**
- preprocessing parallel corpora before alignment
- multilingual medieval text alignment workflows
- digital philology and computational humanities research
It is especially designed to be used with [Aquilign](https://github.com/ProMeText/Aquilign).
## Related Resources
- [Aquilign alignment tool](https://github.com/ProMeText/Aquilign)
- [Multilingual Segmentation Dataset](https://github.com/ProMeText/multilingual-segmentation-dataset)
- [ProMeTEXT GitHub organization](https://github.com/ProMeText)
## Citation
If you use this model, please cite the related dataset and publication.
### Dataset
```bibtex
@dataset{ing2025multilingual,
author = {Ing, L. and Gille Levenson, M. and Macedo, C.},
title = {Multilingual Segmentation Dataset for Historical Prose (13th--16th c.)},
year = {2025},
publisher = {Zenodo},
version = {1.0},
doi = {10.5281/zenodo.16992629},
url = {https://doi.org/10.5281/zenodo.16992629},
license = {Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International}
}
```
### Related Publication
```bibtex
@inproceedings{ing-etal-2026-phrase,
title = {Phrase-Level Segmentation on Medieval Corpora for Aligning Multilingual Texts},
author = {Ing, Lucence and Gille Levenson, Matthias and Macedo, Carolina},
booktitle = {Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)},
month = {May},
year = {2026},
pages = {936--946},
address = {Palma, Mallorca, Spain},
publisher = {European Language Resources Association (ELRA)},
doi = {10.63317/32huzuuokpfr}
}
``` |