--- license: apache-2.0 base_model: - answerdotai/ModernBERT-base --- # Model Summary QuillIndex is an indexing model developed by the [ETH Library](https://library.ethz.ch/). It is trained on the handwritten documents of the [School Board minutes](https://sr.ethz.ch/) (1854-1902) of [ETH Zurich](https://ethz.ch/en.html). Trained on samples created by [ChronoQuill](https://github.com/eth-library/ChronoQuill), an HTR pipeline, QuillIndex assigns labels for a given agenda item. Its taxonomy is constrained to a derived set from the underlying data, the annual indexes and corresponding agenda items. Due to the nature of the model, it cannot hallucinate arbitrary labels. ## Model Architecture & Evaluation QuillIndex is an encoder-only sequence classifier and uses [ModernBERT](answerdotai/ModernBERT-base) as a pre-trained backbone. The taxonomy can be found within the config file. A complete technical report on QuillIndex, its architecture and evaluation can be found in the respective section in [here](https://www.research-collection.ethz.ch/server/api/core/bitstreams/8053d4d8-51b4-4103-8164-b5068ddb3903/content). ## Environment Setup (Linux x86) ```bash uv venv quill_env --python 3.12 source quill_env/bin/activate uv pip install torch torchvision # CUDA 12.8 uv pip install transformers ``` ## Python Setup ```python import torch from transformers import AutoModelForSequenceClassification, AutoTokenizer device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model_id = "eth-library/QuillIndex" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForSequenceClassification.from_pretrained(model_id).to(device) # Excerpt from an agenda item input_string = """ § 270 Aufnahme von Schülern. In Folge Berichtes & Antrages, des Direktors der polytechnischen Beitriefens von Schülern Schule Namens der Gesammtkonferenz und gestützt auf die vom Schulrathe ertheilte Vollmacht werden folgende in Zürich geprüfte Kandidaten als Schüler des Polytechnikums aufgenommen. I. Bauschule I. Jahreskurs: 1. Köch, Johannes von Urner (Wohlfellen) 2. Pulpius, Leon n. Genf 3. Guasquet, Karl Jakob n. Basel 4. Kleffler, Henri n. Genf 5. Mglies, Carl Jakob n. Frankfurt II. Ingenieurschule 1 Jahreskurs. 6. Chialiva, Louis n. Lugano 7. Füsi, Carl n. Zürich 8. Schenker, Viktor n. Dornach (Solothurn) """ input = tokenizer(input_string, return_tensors="pt").to(device) logits = model(**input).logits prediction = (torch.sigmoid(logits) > 0.5).int() # Adjust for more restrictive label assignment id2label = model.config.id2label predicted_labels = [id2label[i] for i in range(len(prediction[0])) if prediction[0][i] == 1] print(predicted_labels) # ['Antrag', 'Aufnahme', 'Bericht', 'Direktor', 'Ingenieurschule', 'Schüler', 'Vollmacht'] ``` ## Generalization The taxonomy is derived from 19th-century ETH School Board minutes. The model is fine-tuned exclusively on 19th-century German. Application to other domains or periods may be unreliable. # License We release QuillIndex under the Apache 2.0 license. # Citation If you use this model, please cite: ```bash @article{marbach2026closed, title={Closed-Vocabulary Multi-Label Indexing Pipeline for Historical Documents}, author={Marbach, Jeremy}, year={2026}, publisher={ETH Zurich}, url={https://www.research-collection.ethz.ch/server/api/core/bitstreams/8053d4d8-51b4-4103-8164-b5068ddb3903/content} } ```