Model description

This space contains the static cbow word2vec models along with their embedding matrices, trained on:

20,000 Greek news articles from the GreekNews-20k dataset
70,000 Greek news articles from the News Articles in Greek dataset
93,000 Greek Wikipedia articles from the IMISLab GreekWikipedia dataset
9,000 aritcles from the CGL Modern Greek Texts Corpora: newspaper corpus "Ta Nea"

Hyperparameters

The following hyperparameters were used to train the word2vec models

window=5
sg=0(CBOWmode)
cbow_mean=1
workers=8
negative=10
sample=1e-4
epochs=50

Model performance

To benchmark these embeddings we reported our BiLSTMs performance on joint ner and classification on the GreekNews-20k dataset along with the WordSim353's Pearson/Spearman correlations.

Sentences	Vocabulary	Dimension	min_count	OOV	WS-353 Pearson	WS-353 Similarity	NER MicroF1%	Class Acc%	Total model parameters (M)
4564417	94865	128	44	42.4	0.42	0.40	85	76	13.7
4564417	140631	72	27	36.2	0.39	0.39	84	76	11.9

Author

This model has been released along side with the article: Named Entity Recognition and News Article Classification: A Lightweight Approach.

To use this model please cite the following:

@ARTICLE{11148234,
  author={Katranis, Ioannis and Troussas, Christos and Krouska, Akrivi and Mylonas, Phivos and Sgouropoulou, Cleo},
  journal={IEEE Access}, 
  title={Named Entity Recognition and News Article Classification: A Lightweight Approach}, 
  year={2025},
  volume={13},
  number={},
  pages={155031-155046},
  keywords={Accuracy;Transformers;Pipelines;Named entity recognition;Computational modeling;Vocabulary;Tagging;Real-time systems;Benchmark testing;Training;Distilled transformer;edge-deployable model;multiclass news-topic classification;named entity recognition},
  doi={10.1109/ACCESS.2025.3605709}}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support