Update README.md
Browse files
README.md
CHANGED
|
@@ -1,12 +1,14 @@
|
|
| 1 |
---
|
| 2 |
-
language:
|
|
|
|
|
|
|
| 3 |
tags:
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
license: apache-2.0
|
| 11 |
---
|
| 12 |
|
|
@@ -29,9 +31,10 @@ hidden space before being passed to the transformer encoder.
|
|
| 29 |
|
| 30 |
The model was trained on a corpus of Old Russian and Church Slavonic texts assembled from the following sources:
|
| 31 |
|
| 32 |
-
| Source | Language | Tokens | Link |
|
| 33 |
|--------|----------|--------|------|
|
| 34 |
-
| Birchbark manuscripts | Old Novgorodian |
|
|
|
|
| 35 |
| DIACU | Old Church Slavonic; Church Slavonic (Old Russian, Middle Bulgarian, Serbian, Resava recensions); Middle Russian | 1,683,307 | [ACL Anthology](https://aclanthology.org/2025.bsnlp-1.12/) |
|
| 36 |
| TOROT | Old Russian; Church Slavonic | 682,430 | [torottreebank.github.io](https://torottreebank.github.io) |
|
| 37 |
| Bible (Ponomar) | Church Slavonic | 603,047 | [GitHub](https://github.com/typiconman/ponomar/tree/master/Ponomar/languages/cu/bible/elis) |
|
|
@@ -67,4 +70,4 @@ If you use this model, please cite:
|
|
| 67 |
author = {Maxim Eremeev},
|
| 68 |
year = {2026},
|
| 69 |
}
|
| 70 |
-
```
|
|
|
|
| 1 |
---
|
| 2 |
+
language:
|
| 3 |
+
- orv
|
| 4 |
+
- cu
|
| 5 |
tags:
|
| 6 |
+
- masked-language-modeling
|
| 7 |
+
- old-slavonic
|
| 8 |
+
- old-russian
|
| 9 |
+
- birchbark
|
| 10 |
+
- historical-nlp
|
| 11 |
+
- dual-embeddings
|
| 12 |
license: apache-2.0
|
| 13 |
---
|
| 14 |
|
|
|
|
| 31 |
|
| 32 |
The model was trained on a corpus of Old Russian and Church Slavonic texts assembled from the following sources:
|
| 33 |
|
| 34 |
+
| Source | Language | Word Tokens | Link |
|
| 35 |
|--------|----------|--------|------|
|
| 36 |
+
| Birchbark manuscripts | Old Novgorodian (mostly) | 21,464 | [gramoty.ru](https://gramoty.ru) |
|
| 37 |
+
| Epigraphy | Old Church Slavonic (mostly) | 8,102 | [epigraphica.ru](https://epigraphica.ru) |
|
| 38 |
| DIACU | Old Church Slavonic; Church Slavonic (Old Russian, Middle Bulgarian, Serbian, Resava recensions); Middle Russian | 1,683,307 | [ACL Anthology](https://aclanthology.org/2025.bsnlp-1.12/) |
|
| 39 |
| TOROT | Old Russian; Church Slavonic | 682,430 | [torottreebank.github.io](https://torottreebank.github.io) |
|
| 40 |
| Bible (Ponomar) | Church Slavonic | 603,047 | [GitHub](https://github.com/typiconman/ponomar/tree/master/Ponomar/languages/cu/bible/elis) |
|
|
|
|
| 70 |
author = {Maxim Eremeev},
|
| 71 |
year = {2026},
|
| 72 |
}
|
| 73 |
+
```
|