Update README.md
Browse files
README.md
CHANGED
|
@@ -27,7 +27,20 @@ hidden space before being passed to the transformer encoder.
|
|
| 27 |
|
| 28 |
## Training
|
| 29 |
|
| 30 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
## Usage
|
| 33 |
|
|
@@ -49,7 +62,7 @@ model = AutoModelForMaskedLM.from_pretrained(
|
|
| 49 |
|
| 50 |
If you use this model, please cite:
|
| 51 |
```
|
| 52 |
-
@thesis{
|
| 53 |
title = {Automatic Restoration and Analysis of Birchbark Manuscripts},
|
| 54 |
author = {Maxim Eremeev},
|
| 55 |
year = {2026},
|
|
|
|
| 27 |
|
| 28 |
## Training
|
| 29 |
|
| 30 |
+
The model was trained on a corpus of Old Russian and Church Slavonic texts assembled from the following sources:
|
| 31 |
+
|
| 32 |
+
| Source | Language | Tokens | Link |
|
| 33 |
+
|--------|----------|--------|------|
|
| 34 |
+
| Birchbark manuscripts | Old Novgorodian | — | [gramoty.ru](https://gramoty.ru), [epigraphica.ru](https://epigraphica.ru) |
|
| 35 |
+
| DIACU | Old Church Slavonic; Church Slavonic (Old Russian, Middle Bulgarian, Serbian, Resava recensions); Middle Russian | 1,683,307 | [ACL Anthology](https://aclanthology.org/2025.bsnlp-1.12/) |
|
| 36 |
+
| TOROT | Old Russian; Church Slavonic | 682,430 | [torottreebank.github.io](https://torottreebank.github.io) |
|
| 37 |
+
| Bible (Ponomar) | Church Slavonic | 603,047 | [GitHub](https://github.com/typiconman/ponomar/tree/master/Ponomar/languages/cu/bible/elis) |
|
| 38 |
+
| Byliny | Old Russian (XI–XVII c.) | 430,103 | [rusneb.ru](https://rusneb.ru/catalog/000199_000009_003636356/) |
|
| 39 |
+
| Pushkin House | Old Russian | 256,503 | [lib2.pushkinskijdom.ru](https://lib2.pushkinskijdom.ru) |
|
| 40 |
+
| Military Statute (Part 2) | Old Russian | 49,787 | [rusneb.ru](https://rusneb.ru/catalog/000199_000009_004093983/) |
|
| 41 |
+
| NKRYA (historical) | Old Russian; Old Rus (XI–XVIII c.) | 42,412 | [ruscorpora.ru](https://ruscorpora.ru) |
|
| 42 |
+
|
| 43 |
+
Masking details: MLM probability 8%, span masking, edge masking, random gap augmentation.
|
| 44 |
|
| 45 |
## Usage
|
| 46 |
|
|
|
|
| 62 |
|
| 63 |
If you use this model, please cite:
|
| 64 |
```
|
| 65 |
+
@thesis{
|
| 66 |
title = {Automatic Restoration and Analysis of Birchbark Manuscripts},
|
| 67 |
author = {Maxim Eremeev},
|
| 68 |
year = {2026},
|