mschonhardt commited on
Commit
b7ae6f1
·
verified ·
1 Parent(s): b884029

Upload folder using huggingface_hub

Browse files
Files changed (5) hide show
  1. .gitattributes +1 -0
  2. README.md +80 -0
  3. loss.txt +0 -0
  4. mhd-forward.pt +3 -0
  5. training.log +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ training.log filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: gmh
3
+ library_name: flair
4
+ license: cc-by-sa-4.0
5
+ tags:
6
+ - flair
7
+ - embeddings
8
+ - contextual-string-embeddings
9
+ - middle-high-german
10
+ - medieval-german
11
+ - digital-humanities
12
+ - mhdbdb
13
+ widget:
14
+ - text: "von abegescheidenheit ich hân der geschrift vil gelesen"
15
+ ---
16
+
17
+ # Middle High German Contextual String Embeddings (Forward)
18
+
19
+ This model provides contextual string embeddings for Middle High German (MHG), trained on literary and historical texts as part of the digital humanities research surrounding medieval Germanic languages.
20
+
21
+ It is optimized for **Middle High German corpora** and serves as a robust general-purpose embedding for downstream NLP tasks involving medieval German.
22
+
23
+ ## Data provenance and acknowledgment
24
+ The model was trained on on open data prepared by Mittelhochdeutsche Begriffsdatenbank (MHDBDB). Universität Salzburg. Koordination: Katharina Zeppezauer-Wachauer. Seit 1992. URL: [http://www.mhdbdb.plus.ac.at/](http://www.mhdbdb.plus.ac.at/). DOI: 10.60646/MHDBDB).
25
+
26
+ ## Model Description
27
+ - **Architecture:** Character-level LSTM (Flair Language Model)
28
+ - **Direction:** Forward
29
+ - **Data Source:** TEI-encoded texts from the [Middle High German Conceptual Database](https://github.com/Middle-High-German-Conceptual-Database/TEI-Texte) (MHDBDB).
30
+ - **Training Epochs:** 30
31
+ - **Final Perplexity:** 7.34
32
+ - **Final Validation Loss:** 1.9934
33
+
34
+ ## Usage
35
+ To use this model in Flair, install the library (`pip install flair`) and load the model directly from the Hub. Note that for best results in downstream tasks like NER or Part-of-Speech tagging, it is recommended to use this forward model in combination with a corresponding backward model (`mschonhardt/mdh-mhdbdb-backward`).
36
+
37
+ ```python
38
+ from flair.embeddings import FlairEmbeddings, StackedEmbeddings
39
+
40
+ # Load the forward model
41
+ forward_embeddings = FlairEmbeddings('mschonhardt/mdh-mhdbdb-forward')
42
+
43
+ # Load the backward model
44
+ backward_embeddings = FlairEmbeddings('mschonhardt/mdh-mhdbdb-backward')
45
+
46
+ # Stack them for best performance
47
+ stacked_embeddings = StackedEmbeddings([forward_embeddings, backward_embeddings])
48
+
49
+ # Example usage
50
+ from flair.data import Sentence
51
+ sentence = Sentence("von abegescheidenheit ich hân der geschrift vil gelesen")
52
+ forward_embeddings.embed(sentence)
53
+ ```
54
+
55
+ ## Citation
56
+ If you use this model, please cite the original research paper as well as the model source.
57
+
58
+ ```bibtex
59
+ @software{schonhardt_michael_2026_mhg_flair,
60
+ author = "Schonhardt, Michael",
61
+ title = "Middle High German Contextual String Embeddings (Forward): Trained on the MHDBDB TEI-Texte Corpus",
62
+ year = 2026,
63
+ publisher = "Zenodo",
64
+ doi="18657493",
65
+ url="https://doi.org/10.5281/zenodo.18657493"
66
+ }
67
+ ```
68
+
69
+ ```bibtex
70
+ @inproceedings{akbik-etal-2018-contextual,
71
+ title = "Contextual String Embeddings for Sequence Labeling",
72
+ author = "Akbik, Alan and
73
+ Blythe, Duncan and
74
+ Vollgraf, Roland",
75
+ booktitle = "Proceedings of the 27th International Conference on Computational Linguistics",
76
+ year = "2018",
77
+ url = "[https://aclanthology.org/C18-1139/](https://aclanthology.org/C18-1139/)",
78
+ pages = "1638--1649"
79
+ }
80
+ ```
loss.txt ADDED
The diff for this file is too large to render. See raw diff
 
mhd-forward.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e3dac092f29b1ae2e96321e08be0347d7e73f8be17436137b825a1393bcada45
3
+ size 210518109
training.log ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:94abe2ffdcb43dd1343cbcf5f84fdc93dbbedcc2d55ef02fbcdad869cee00cbb
3
+ size 42643595