MaximEremeev commited on
Commit
70cf647
·
verified ·
1 Parent(s): 8aff420

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -10
README.md CHANGED
@@ -1,12 +1,14 @@
1
  ---
2
- language: orv
 
 
3
  tags:
4
- - masked-language-modeling
5
- - old-slavonic
6
- - old-russian
7
- - birchbark
8
- - historical-nlp
9
- - dual-embeddings
10
  license: apache-2.0
11
  ---
12
 
@@ -29,9 +31,10 @@ hidden space before being passed to the transformer encoder.
29
 
30
  The model was trained on a corpus of Old Russian and Church Slavonic texts assembled from the following sources:
31
 
32
- | Source | Language | Tokens | Link |
33
  |--------|----------|--------|------|
34
- | Birchbark manuscripts | Old Novgorodian | | [gramoty.ru](https://gramoty.ru), [epigraphica.ru](https://epigraphica.ru) |
 
35
  | DIACU | Old Church Slavonic; Church Slavonic (Old Russian, Middle Bulgarian, Serbian, Resava recensions); Middle Russian | 1,683,307 | [ACL Anthology](https://aclanthology.org/2025.bsnlp-1.12/) |
36
  | TOROT | Old Russian; Church Slavonic | 682,430 | [torottreebank.github.io](https://torottreebank.github.io) |
37
  | Bible (Ponomar) | Church Slavonic | 603,047 | [GitHub](https://github.com/typiconman/ponomar/tree/master/Ponomar/languages/cu/bible/elis) |
@@ -67,4 +70,4 @@ If you use this model, please cite:
67
  author = {Maxim Eremeev},
68
  year = {2026},
69
  }
70
- ```
 
1
  ---
2
+ language:
3
+ - orv
4
+ - cu
5
  tags:
6
+ - masked-language-modeling
7
+ - old-slavonic
8
+ - old-russian
9
+ - birchbark
10
+ - historical-nlp
11
+ - dual-embeddings
12
  license: apache-2.0
13
  ---
14
 
 
31
 
32
  The model was trained on a corpus of Old Russian and Church Slavonic texts assembled from the following sources:
33
 
34
+ | Source | Language | Word Tokens | Link |
35
  |--------|----------|--------|------|
36
+ | Birchbark manuscripts | Old Novgorodian (mostly) | 21,464 | [gramoty.ru](https://gramoty.ru) |
37
+ | Epigraphy | Old Church Slavonic (mostly) | 8,102 | [epigraphica.ru](https://epigraphica.ru) |
38
  | DIACU | Old Church Slavonic; Church Slavonic (Old Russian, Middle Bulgarian, Serbian, Resava recensions); Middle Russian | 1,683,307 | [ACL Anthology](https://aclanthology.org/2025.bsnlp-1.12/) |
39
  | TOROT | Old Russian; Church Slavonic | 682,430 | [torottreebank.github.io](https://torottreebank.github.io) |
40
  | Bible (Ponomar) | Church Slavonic | 603,047 | [GitHub](https://github.com/typiconman/ponomar/tree/master/Ponomar/languages/cu/bible/elis) |
 
70
  author = {Maxim Eremeev},
71
  year = {2026},
72
  }
73
+ ```