|
|
--- |
|
|
license: mit |
|
|
--- |
|
|
# Model description |
|
|
This space contains the static cbow word2vec models along with their embedding matrices, trained on: |
|
|
- 20,000 Greek news articles from the [GreekNews-20k dataset](https://huggingface.co/datasets/katrjohn/GreekNews-20k) |
|
|
- 70,000 Greek news articles from the [News Articles in Greek dataset](https://www.kaggle.com/datasets/kpittos/news-articles) |
|
|
- 93,000 Greek Wikipedia articles from the [IMISLab GreekWikipedia dataset](https://huggingface.co/datasets/IMISLab/GreekWikipedia) |
|
|
- 9,000 aritcles from the [CGL Modern Greek Texts Corpora: newspaper corpus "Ta Nea"](https://inventory.clarin.gr/corpus/910) |
|
|
|
|
|
## Hyperparameters |
|
|
The following hyperparameters were used to train the word2vec models |
|
|
- window=5 |
|
|
- sg=0(CBOWmode) |
|
|
- cbow_mean=1 |
|
|
- workers=8 |
|
|
- negative=10 |
|
|
- sample=1e-4 |
|
|
- epochs=50 |
|
|
|
|
|
### Model performance |
|
|
To benchmark these embeddings we reported our BiLSTMs performance on joint ner and classification on the [GreekNews-20k dataset](https://huggingface.co/datasets/katrjohn/GreekNews-20k) along with the WordSim353's Pearson/Spearman correlations. |
|
|
|
|
|
|Sentences|Vocabulary|Dimension|min_count|OOV|WS-353 Pearson|WS-353 Similarity|NER MicroF1%|Class Acc%| Total model parameters (M)| |
|
|
|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------| |
|
|
|4564417|94865|128|44|42.4|0.42|0.40|85|76|13.7| |
|
|
|4564417|140631|72|27|36.2|0.39|0.39|84|76|11.9| |
|
|
|
|
|
|
|
|
|
|
|
#### Author |
|
|
This model has been released along side with the article: Named Entity Recognition and News Article Classification: A Lightweight Approach. |
|
|
|
|
|
To use this model please cite the following: |
|
|
``` |
|
|
@ARTICLE{11148234, |
|
|
author={Katranis, Ioannis and Troussas, Christos and Krouska, Akrivi and Mylonas, Phivos and Sgouropoulou, Cleo}, |
|
|
journal={IEEE Access}, |
|
|
title={Named Entity Recognition and News Article Classification: A Lightweight Approach}, |
|
|
year={2025}, |
|
|
volume={13}, |
|
|
number={}, |
|
|
pages={155031-155046}, |
|
|
keywords={Accuracy;Transformers;Pipelines;Named entity recognition;Computational modeling;Vocabulary;Tagging;Real-time systems;Benchmark testing;Training;Distilled transformer;edge-deployable model;multiclass news-topic classification;named entity recognition}, |
|
|
doi={10.1109/ACCESS.2025.3605709}} |
|
|
|
|
|
``` |