update readMe
Browse files
README.md
CHANGED
|
@@ -8,7 +8,7 @@ base_model:
|
|
| 8 |
QuillIndex is an indexing model developed by the [ETH Library](https://library.ethz.ch/). It is trained on the handwritten documents of the [School Board minutes](https://sr.ethz.ch/) (1854-1902) of [ETH Zurich](https://ethz.ch/en.html). Trained on samples created by [ChronoQuill](https://github.com/eth-library/ChronoQuill), an HTR pipeline, QuillIndex assigns labels for a given agenda item. Its taxonomy is constrained to a derived set from the underlying data, the annual indexes and corresponding agenda items. Due to the nature of the model, it cannot hallucinate arbitrary labels.
|
| 9 |
|
| 10 |
## Model Architecture & Evaluation
|
| 11 |
-
QuillIndex is an encoder-only sequence classifier and uses [ModernBERT](answerdotai/ModernBERT-base) as a pre-trained backbone. A complete technical report on QuillIndex, its architecture and evaluation can be found in the respective section in [here](https://www.research-collection.ethz.ch/server/api/core/bitstreams/8053d4d8-51b4-4103-8164-b5068ddb3903/content).
|
| 12 |
|
| 13 |
## Environment Setup (Linux x86)
|
| 14 |
|
|
@@ -49,8 +49,11 @@ print(predicted_labels)
|
|
| 49 |
# ['Antrag', 'Aufnahme', 'Bericht', 'Direktor', 'Ingenieurschule', 'Schüler', 'Vollmacht']
|
| 50 |
```
|
| 51 |
|
|
|
|
|
|
|
|
|
|
| 52 |
# License
|
| 53 |
-
We release QuillIndex
|
| 54 |
|
| 55 |
# Citation
|
| 56 |
If you use this model, please cite:
|
|
|
|
| 8 |
QuillIndex is an indexing model developed by the [ETH Library](https://library.ethz.ch/). It is trained on the handwritten documents of the [School Board minutes](https://sr.ethz.ch/) (1854-1902) of [ETH Zurich](https://ethz.ch/en.html). Trained on samples created by [ChronoQuill](https://github.com/eth-library/ChronoQuill), an HTR pipeline, QuillIndex assigns labels for a given agenda item. Its taxonomy is constrained to a derived set from the underlying data, the annual indexes and corresponding agenda items. Due to the nature of the model, it cannot hallucinate arbitrary labels.
|
| 9 |
|
| 10 |
## Model Architecture & Evaluation
|
| 11 |
+
QuillIndex is an encoder-only sequence classifier and uses [ModernBERT](answerdotai/ModernBERT-base) as a pre-trained backbone. The taxonomy can be found within the config file. A complete technical report on QuillIndex, its architecture and evaluation can be found in the respective section in [here](https://www.research-collection.ethz.ch/server/api/core/bitstreams/8053d4d8-51b4-4103-8164-b5068ddb3903/content).
|
| 12 |
|
| 13 |
## Environment Setup (Linux x86)
|
| 14 |
|
|
|
|
| 49 |
# ['Antrag', 'Aufnahme', 'Bericht', 'Direktor', 'Ingenieurschule', 'Schüler', 'Vollmacht']
|
| 50 |
```
|
| 51 |
|
| 52 |
+
## Generalization
|
| 53 |
+
The taxonomy is derived from 19th-century ETH School Board minutes. The model is fine-tuned exclusively on 19th-century German. Application to other domains or periods may be unreliable.
|
| 54 |
+
|
| 55 |
# License
|
| 56 |
+
We release QuillIndex under the Apache 2.0 license.
|
| 57 |
|
| 58 |
# Citation
|
| 59 |
If you use this model, please cite:
|