GreekNewsWord2Vec / README.md
katrjohn's picture
Update README.md
ff5c9ce verified
---
license: mit
---
# Model description
This space contains the static cbow word2vec models along with their embedding matrices, trained on:
- 20,000 Greek news articles from the [GreekNews-20k dataset](https://huggingface.co/datasets/katrjohn/GreekNews-20k)
- 70,000 Greek news articles from the [News Articles in Greek dataset](https://www.kaggle.com/datasets/kpittos/news-articles)
- 93,000 Greek Wikipedia articles from the [IMISLab GreekWikipedia dataset](https://huggingface.co/datasets/IMISLab/GreekWikipedia)
- 9,000 aritcles from the [CGL Modern Greek Texts Corpora: newspaper corpus "Ta Nea"](https://inventory.clarin.gr/corpus/910)
## Hyperparameters
The following hyperparameters were used to train the word2vec models
- window=5
- sg=0(CBOWmode)
- cbow_mean=1
- workers=8
- negative=10
- sample=1e-4
- epochs=50
### Model performance
To benchmark these embeddings we reported our BiLSTMs performance on joint ner and classification on the [GreekNews-20k dataset](https://huggingface.co/datasets/katrjohn/GreekNews-20k) along with the WordSim353's Pearson/Spearman correlations.
|Sentences|Vocabulary|Dimension|min_count|OOV|WS-353 Pearson|WS-353 Similarity|NER MicroF1%|Class Acc%| Total model parameters (M)|
|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|
|4564417|94865|128|44|42.4|0.42|0.40|85|76|13.7|
|4564417|140631|72|27|36.2|0.39|0.39|84|76|11.9|
#### Author
This model has been released along side with the article: Named Entity Recognition and News Article Classification: A Lightweight Approach.
To use this model please cite the following:
```
@ARTICLE{11148234,
author={Katranis, Ioannis and Troussas, Christos and Krouska, Akrivi and Mylonas, Phivos and Sgouropoulou, Cleo},
journal={IEEE Access},
title={Named Entity Recognition and News Article Classification: A Lightweight Approach},
year={2025},
volume={13},
number={},
pages={155031-155046},
keywords={Accuracy;Transformers;Pipelines;Named entity recognition;Computational modeling;Vocabulary;Tagging;Real-time systems;Benchmark testing;Training;Distilled transformer;edge-deployable model;multiclass news-topic classification;named entity recognition},
doi={10.1109/ACCESS.2025.3605709}}
```