Update README.md
Browse files
README.md
CHANGED
|
@@ -10,8 +10,8 @@ model-index:
|
|
| 10 |
name: Historic Text Normalization (type-level)
|
| 11 |
type: translation
|
| 12 |
dataset:
|
| 13 |
-
name: DTA
|
| 14 |
-
type: aehrm/dtaec-
|
| 15 |
split: dev
|
| 16 |
metrics:
|
| 17 |
- name: Word Accuracy
|
|
@@ -34,7 +34,7 @@ Note: This model is part of a larger system, which uses an additional GPT-based
|
|
| 34 |
|
| 35 |
## Training and evaluation data
|
| 36 |
|
| 37 |
-
The model has been trained on the DTA-EC Parallel Corpus Lexicon ([aehrm/dtaec-lexica](https://huggingface.co/datasets/aehrm/dtaec-
|
| 38 |
|
| 39 |
Training was done on type-level, where, given the historic form of a type, the model must predict the corresponding normalized type *that appeared most frequent in the parallel corpus*.
|
| 40 |
|
|
|
|
| 10 |
name: Historic Text Normalization (type-level)
|
| 11 |
type: translation
|
| 12 |
dataset:
|
| 13 |
+
name: DTA EvalCorpus Lexicon
|
| 14 |
+
type: aehrm/dtaec-lexicon
|
| 15 |
split: dev
|
| 16 |
metrics:
|
| 17 |
- name: Word Accuracy
|
|
|
|
| 34 |
|
| 35 |
## Training and evaluation data
|
| 36 |
|
| 37 |
+
The model has been trained on the DTA-EC Parallel Corpus Lexicon ([aehrm/dtaec-lexica](https://huggingface.co/datasets/aehrm/dtaec-lexicon)), which is from a [parallel corpus](https://kaskade.dwds.de/~moocow/software/dtaec/) of the Deutsche Textarchiv (German Text Archive), who aligned historic prints of documents with their moden editions in contemporary orthography.
|
| 38 |
|
| 39 |
Training was done on type-level, where, given the historic form of a type, the model must predict the corresponding normalized type *that appeared most frequent in the parallel corpus*.
|
| 40 |
|