Instructions to use ProMeText/aquilign-multilingual-segmenter with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ProMeText/aquilign-multilingual-segmenter with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="ProMeText/aquilign-multilingual-segmenter")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("ProMeText/aquilign-multilingual-segmenter") model = AutoModelForTokenClassification.from_pretrained("ProMeText/aquilign-multilingual-segmenter") - Notebooks
- Google Colab
- Kaggle
| license: cc-by-nc-sa-4.0 | |
| language: | |
| - la | |
| - fr | |
| - en | |
| - pt | |
| - ca | |
| - es | |
| - it | |
| pipeline_tag: token-classification | |
| library_name: transformers | |
| tags: | |
| - medieval-texts | |
| - phrase-segmentation | |
| - multilingual | |
| # Aquilign Multilingual Segmenter | |
| **Aquilign Multilingual Segmenter** is a token-classification model for phrase-level segmentation of medieval and historical texts. | |
| The model is designed to detect custom segmentation delimiters in multilingual historical corpora and is used as part of the [Aquilign](https://github.com/ProMeText/Aquilign) alignment workflow. | |
| ## Model Description | |
| The segmenter is based on a trainable `BertForTokenClassification` model from Hugging Face’s `transformers` library. | |
| It was fine-tuned on historical prose from the [Multilingual Segmentation Dataset](https://github.com/ProMeText/multilingual-segmentation-dataset) to identify phrase-level segmentation boundaries. | |
| ## Supported Languages | |
| - Latin | |
| - French | |
| - Castilian | |
| - Portuguese | |
| - Catalan | |
| - English | |
| - Italian | |
| ## Intended Use | |
| This model is intended for: | |
| - phrase-level segmentation of **medieval texts** | |
| - preprocessing parallel corpora before alignment | |
| - multilingual medieval text alignment workflows | |
| - digital philology and computational humanities research | |
| It is especially designed to be used with [Aquilign](https://github.com/ProMeText/Aquilign). | |
| ## Related Resources | |
| - [Aquilign alignment tool](https://github.com/ProMeText/Aquilign) | |
| - [Multilingual Segmentation Dataset](https://github.com/ProMeText/multilingual-segmentation-dataset) | |
| - [ProMeTEXT GitHub organization](https://github.com/ProMeText) | |
| ## Citation | |
| If you use this model, please cite the related dataset and publication. | |
| ### Dataset | |
| ```bibtex | |
| @dataset{ing2025multilingual, | |
| author = {Ing, L. and Gille Levenson, M. and Macedo, C.}, | |
| title = {Multilingual Segmentation Dataset for Historical Prose (13th--16th c.)}, | |
| year = {2025}, | |
| publisher = {Zenodo}, | |
| version = {1.0}, | |
| doi = {10.5281/zenodo.16992629}, | |
| url = {https://doi.org/10.5281/zenodo.16992629}, | |
| license = {Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International} | |
| } | |
| ``` | |
| ### Related Publication | |
| ```bibtex | |
| @inproceedings{ing-etal-2026-phrase, | |
| title = {Phrase-Level Segmentation on Medieval Corpora for Aligning Multilingual Texts}, | |
| author = {Ing, Lucence and Gille Levenson, Matthias and Macedo, Carolina}, | |
| booktitle = {Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)}, | |
| month = {May}, | |
| year = {2026}, | |
| pages = {936--946}, | |
| address = {Palma, Mallorca, Spain}, | |
| publisher = {European Language Resources Association (ELRA)}, | |
| doi = {10.63317/32huzuuokpfr} | |
| } | |
| ``` |