Model description
This space contains the static cbow word2vec models along with their embedding matrices, trained on:
- 20,000 Greek news articles from the GreekNews-20k dataset
- 70,000 Greek news articles from the News Articles in Greek dataset
- 93,000 Greek Wikipedia articles from the IMISLab GreekWikipedia dataset
- 9,000 aritcles from the CGL Modern Greek Texts Corpora: newspaper corpus "Ta Nea"
Hyperparameters
The following hyperparameters were used to train the word2vec models
- window=5
- sg=0(CBOWmode)
- cbow_mean=1
- workers=8
- negative=10
- sample=1e-4
- epochs=50
Model performance
To benchmark these embeddings we reported our BiLSTMs performance on joint ner and classification on the GreekNews-20k dataset along with the WordSim353's Pearson/Spearman correlations.
| Sentences | Vocabulary | Dimension | min_count | OOV | WS-353 Pearson | WS-353 Similarity | NER MicroF1% | Class Acc% | Total model parameters (M) |
|---|---|---|---|---|---|---|---|---|---|
| 4564417 | 94865 | 128 | 44 | 42.4 | 0.42 | 0.40 | 85 | 76 | 13.7 |
| 4564417 | 140631 | 72 | 27 | 36.2 | 0.39 | 0.39 | 84 | 76 | 11.9 |
Author
This model has been released along side with the article: Named Entity Recognition and News Article Classification: A Lightweight Approach.
To use this model please cite the following:
@ARTICLE{11148234,
author={Katranis, Ioannis and Troussas, Christos and Krouska, Akrivi and Mylonas, Phivos and Sgouropoulou, Cleo},
journal={IEEE Access},
title={Named Entity Recognition and News Article Classification: A Lightweight Approach},
year={2025},
volume={13},
number={},
pages={155031-155046},
keywords={Accuracy;Transformers;Pipelines;Named entity recognition;Computational modeling;Vocabulary;Tagging;Real-time systems;Benchmark testing;Training;Distilled transformer;edge-deployable model;multiclass news-topic classification;named entity recognition},
doi={10.1109/ACCESS.2025.3605709}}
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support