update readMe
Browse files
README.md
CHANGED
|
@@ -2,4 +2,65 @@
|
|
| 2 |
license: apache-2.0
|
| 3 |
base_model:
|
| 4 |
- answerdotai/ModernBERT-base
|
| 5 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
license: apache-2.0
|
| 3 |
base_model:
|
| 4 |
- answerdotai/ModernBERT-base
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
# Model Summary
|
| 8 |
+
QuillIndex is an indexing model developed by the [ETH Library](https://library.ethz.ch/). It is trained on the handwritten documents of the [School Board minutes](https://sr.ethz.ch/) (1854-1902) of [ETH Zurich](https://ethz.ch/en.html). Trained on samples created by [ChronoQuill](https://github.com/eth-library/ChronoQuill), an HTR pipeline, QuillIndex assigns labels for a given agenda item. Its taxonomy is constrained to a derived set from the underlying data, the annual indexes and corresponding agenda items. Due to the nature of the model, it cannot hallucinate arbitrary labels.
|
| 9 |
+
|
| 10 |
+
## Model Architecture & Evaluation
|
| 11 |
+
QuillIndex is an encoder-only sequence classifier and uses [ModernBERT](answerdotai/ModernBERT-base) as a pre-trained backbone. A complete technical report on QuillIndex, its architecture and evaluation can be found in the respective section in [here](https://www.research-collection.ethz.ch/server/api/core/bitstreams/8053d4d8-51b4-4103-8164-b5068ddb3903/content).
|
| 12 |
+
|
| 13 |
+
## Environment Setup (Linux x86)
|
| 14 |
+
|
| 15 |
+
```bash
|
| 16 |
+
uv venv quill_env --python 3.12
|
| 17 |
+
source quill_env/bin/activate
|
| 18 |
+
|
| 19 |
+
uv pip install torch torchvision # CUDA 12.8
|
| 20 |
+
uv pip install transformers
|
| 21 |
+
```
|
| 22 |
+
## Python Setup
|
| 23 |
+
|
| 24 |
+
```python
|
| 25 |
+
import torch
|
| 26 |
+
from transformers import AutoModelForSequenceClassification, AutoTokenizer
|
| 27 |
+
|
| 28 |
+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
| 29 |
+
|
| 30 |
+
model_id = "eth-library/QuillIndex"
|
| 31 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
| 32 |
+
model = AutoModelForSequenceClassification.from_pretrained(model_id).to(device)
|
| 33 |
+
|
| 34 |
+
# Excerpt from an agenda item
|
| 35 |
+
input_string = """
|
| 36 |
+
§ 270 Aufnahme von Schülern.
|
| 37 |
+
In Folge Berichtes & Antrages, des Direktors der polytechnischen Beitriefens von Schülern Schule Namens der Gesammtkonferenz und gestützt auf die vom Schulrathe ertheilte Vollmacht werden folgende in Zürich geprüfte Kandidaten als Schüler des Polytechnikums aufgenommen.
|
| 38 |
+
I. Bauschule I. Jahreskurs: 1. Köch, Johannes von Urner (Wohlfellen) 2. Pulpius, Leon n. Genf 3. Guasquet, Karl Jakob n. Basel 4. Kleffler, Henri n. Genf 5. Mglies, Carl Jakob n. Frankfurt II. Ingenieurschule 1 Jahreskurs. 6. Chialiva, Louis n. Lugano 7. Füsi, Carl n. Zürich 8. Schenker, Viktor n. Dornach (Solothurn)
|
| 39 |
+
"""
|
| 40 |
+
|
| 41 |
+
input = tokenizer(input_string, return_tensors="pt").to(device)
|
| 42 |
+
logits = model(**input).logits
|
| 43 |
+
prediction = (torch.sigmoid(logits) > 0.5).int() # Adjust for more restrictive label assignment
|
| 44 |
+
|
| 45 |
+
id2label = model.config.id2label
|
| 46 |
+
predicted_labels = [id2label[i] for i in range(len(prediction[0])) if prediction[0][i] == 1]
|
| 47 |
+
|
| 48 |
+
print(predicted_labels)
|
| 49 |
+
# ['Antrag', 'Aufnahme', 'Bericht', 'Direktor', 'Ingenieurschule', 'Schüler', 'Vollmacht']
|
| 50 |
+
```
|
| 51 |
+
|
| 52 |
+
# License
|
| 53 |
+
We release QuillIndex the model weights under the Apache 2.0 license.
|
| 54 |
+
|
| 55 |
+
# Citation
|
| 56 |
+
If you use this model, please cite:
|
| 57 |
+
```bash
|
| 58 |
+
@article{marbach2026closed,
|
| 59 |
+
title={Closed-Vocabulary Multi-Label Indexing Pipeline for Historical Documents},
|
| 60 |
+
author={Marbach, Jeremy},
|
| 61 |
+
year={2026},
|
| 62 |
+
publisher={ETH Zurich},
|
| 63 |
+
url={https://www.research-collection.ethz.ch/server/api/core/bitstreams/8053d4d8-51b4-4103-8164-b5068ddb3903/content}
|
| 64 |
+
}
|
| 65 |
+
```
|
| 66 |
+
|