SlovakBERT based Named Entity Recognition
Deep Learning model developed for Named Entity Recognition (NER) in Slovak. The Gerulata/SlovakBERT based model is fine-tuned on webscraped Slovak news articles. The finished model supports the following IOB tagged entity categories: PERSON, ORGANIZATION, LOCATION, DATE, TIME, MONEY and PERCENTAGE
Related Work

Model usage
Simple Named Entity Recognition (NER)
from transformers import pipeline
ner_pipeline = pipeline(task='ner', model='Raychani1/slovakbert-ner-v2')
input_sentence = 'Hoci podľa ostatných údajov NBS pre Bratislavský kraj je aktuálna priemerná cena nehnuteľností na úrovni 2 072 eur za štvorcový meter, ceny bytov v hlavnom meste sú podstatne vyššie.'
classifications = ner_pipeline(input_sentence)
Named Entity Recognition (NER) with Visualization
For a Visualization Example please refer to the following Gist.
Model Prediction Output Example

Model Training
Training Hyperparameters
| Parameter |
Value |
| per_device_train_batch_size |
4 |
| per_device_eval_batch_size |
4 |
| learning_rate |
5e-05 |
| adam_beta1 |
0.9 |
| adam_beta1 |
0.999 |
| adam_epsilon |
1e-08 |
| num_train_epochs |
15 |
| lr_scheduler_type |
linear |
| seed |
42 |
Training results
Best model results are reached in the 8th training epoch.
| Training Loss |
Epoch |
Step |
Validation Loss |
Precision |
Recall |
F1 |
Accuracy |
| 0.6721 |
1.0 |
70 |
0.2214 |
0.6972 |
0.7308 |
0.7136 |
0.9324 |
| 0.1849 |
2.0 |
140 |
0.1697 |
0.8056 |
0.8365 |
0.8208 |
0.952 |
| 0.0968 |
3.0 |
210 |
0.1213 |
0.882 |
0.8622 |
0.872 |
0.9728 |
| 0.0468 |
4.0 |
280 |
0.1107 |
0.8372 |
0.907 |
0.8708 |
0.9684 |
| 0.0415 |
5.0 |
350 |
0.1644 |
0.8059 |
0.8782 |
0.8405 |
0.9615 |
| 0.0233 |
6.0 |
420 |
0.1255 |
0.8576 |
0.8878 |
0.8724 |
0.9716 |
| 0.0198 |
7.0 |
490 |
0.1383 |
0.8545 |
0.8846 |
0.8693 |
0.9703 |
| 0.0133 |
8.0 |
560 |
0.1241 |
0.884 |
0.9038 |
0.8938 |
0.9735 |
Model Evaluation
Evaluation Dataset Distribution
| NER Tag |
Number of Tokens |
| 0 |
6568 |
| B-Person |
96 |
| I-Person |
83 |
| B-Organizaton |
583 |
| I-Organizaton |
585 |
| B-Location |
59 |
| I-Location |
15 |
| B-Date |
113 |
| I-Date |
87 |
| Time |
5 |
| B-Money |
44 |
| I-Money |
74 |
| B-Percentage |
57 |
| I-Percentage |
54 |
Evaluation Confusion Matrix

Evaluation Model Metrics
| Precision |
Macro-Precision |
Recall |
Macro-Recall |
F1 |
Macro-F1 |
Accuracy |
| 0.9897 |
0.9715 |
0.9897 |
0.9433 |
0.9895 |
0.9547 |
0.9897 |
Framework Versions
- Transformers 4.26.1
- PyTorch 1.13.1
- Tokenizers 0.13.2