aehrm
/

dtaec-type-normalizer

text2text-generation

Eval Results (legacy)

Model card Files Files and versions

aehrm commited on Aug 5, 2024

Commit

7f548cc

·

verified ·

1 Parent(s): 155bb31

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -10,8 +10,8 @@ model-index:
           name: Historic Text Normalization (type-level)
           type: translation
         dataset:
-          name: DTA-EC Lexicon (dev)
-          type: aehrm/dtaec-lexica
           split: dev
         metrics:
           - name: Word Accuracy
@@ -34,7 +34,7 @@ Note: This model is part of a larger system, which uses an additional GPT-based
 ## Training and evaluation data
-The model has been trained on the DTA-EC Parallel Corpus Lexicon ([aehrm/dtaec-lexica](https://huggingface.co/datasets/aehrm/dtaec-lexica)), which is from a [parallel corpus](https://kaskade.dwds.de/~moocow/software/dtaec/) of the Deutsche Textarchiv (German Text Archive), who aligned historic prints of documents with their moden editions in contemporary orthography.
 Training was done on type-level, where, given the historic form of a type, the model must predict the corresponding normalized type *that appeared most frequent in the parallel corpus*.

           name: Historic Text Normalization (type-level)
           type: translation
         dataset:
+          name: DTA EvalCorpus Lexicon
+          type: aehrm/dtaec-lexicon
           split: dev
         metrics:
           - name: Word Accuracy
 ## Training and evaluation data
+The model has been trained on the DTA-EC Parallel Corpus Lexicon ([aehrm/dtaec-lexica](https://huggingface.co/datasets/aehrm/dtaec-lexicon)), which is from a [parallel corpus](https://kaskade.dwds.de/~moocow/software/dtaec/) of the Deutsche Textarchiv (German Text Archive), who aligned historic prints of documents with their moden editions in contemporary orthography.
 Training was done on type-level, where, given the historic form of a type, the model must predict the corresponding normalized type *that appeared most frequent in the parallel corpus*.