---
license: mit
datasets:
- katrjohn/Greek-News-NER-Classif
language:
- el
base_model:
- nlpaueb/bert-base-greek-uncased-v1
tags:
- classification
- NER
- NewsArticle
---

# Model Description
This model is finetuned version of [GreekBert](https://huggingface.co/nlpaueb/bert-base-greek-uncased-v1)

## Dataset
The model finetuned on the [GreekNews-20k](https://huggingface.co/datasets/katrjohn/GreekNews-20k) dataset.

### Results

Perfomance on the [GreekNews-20k dataset](https://huggingface.co/datasets/katrjohn/GreekNews-20k) :

| Class                                              | Precision | Recall | F1-score | Support |
|----------------------------------------------------|-----------|--------|----------|---------|
| Αυτοκίνητο                                         | 0.94      | 0.95   | 0.94     | 201     |
| Επιχειρήσεις και βιομηχανία                        | 0.73      | 0.78   | 0.75     | 369     |
| Έγκλημα και δικαιοσύνη                             | 0.93      | 0.89   | 0.91     | 314     |
| Ειδήσεις για καταστροφές και έκτακτες ανάγκες      | 0.83      | 0.79   | 0.81     | 272     |
| Οικονομικά και χρηματοοικονομικά                   | 0.78      | 0.74   | 0.76     | 495     |
| Εκπαίδευση                                         | 0.85      | 0.92   | 0.88     | 259     |
| Ψυχαγωγία και πολιτισμός                          | 0.81      | 0.85   | 0.83     | 251     |
| Περιβάλλον και κλίμα                              | 0.81      | 0.75   | 0.78     | 292     |
| Οικογένεια και σχέσεις                            | 0.87      | 0.89   | 0.88     | 294     |
| Μόδα                                               | 0.96      | 0.93   | 0.94     | 259     |
| Τρόφιμα και ποτά                                  | 0.69      | 0.90   | 0.78     | 262     |
| Υγεία και ιατρική                                 | 0.76      | 0.71   | 0.73     | 346     |
| Μεταφορές και υποδομές                            | 0.78      | 0.86   | 0.82     | 321     |
| Ψυχική υγεία και ευεξία                           | 0.84      | 0.79   | 0.81     | 348     |
| Πολιτική και κυβέρνηση                            | 0.89      | 0.69   | 0.78     | 339     |
| Θρησκεία                                           | 0.89      | 0.95   | 0.92     | 271     |
| Αθλητισμός                                         | 1.00      | 0.98   | 0.99     | 212     |
| Ταξίδια και αναψυχή                               | 0.88      | 0.88   | 0.88     | 424     |
| Τεχνολογία και επιστήμη                           | 0.77      | 0.78   | 0.78     | 308     |
| **accuracy**                                      |           |        | 0.83     | 5837    |
| **macro avg**                                     | 0.84      | 0.84   | 0.84     | 5837    |
| **weighted avg**                                  | 0.83      | 0.83   | 0.83     | 5837    |

| Entity    | Precision | Recall | F1-score | Support |
|----------|-----------|--------|----------|---------|
| CARDINAL | 0.87      | 0.97    | 0.91     | 25656   |
| DATE     | 0.89      | 0.92    | 0.91     | 15469   |
| EVENT    | 0.71      | 0.73    | 0.72     | 1720    |
| FAC      | 0.53      | 0.60    | 0.56     | 2118    |
| GPE      | 0.88      | 0.95    | 0.91     | 16010   |
| LOC      | 0.82      | 0.70    | 0.75     | 3547    |
| MONEY    | 0.78      | 0.83    | 0.80     | 3882    |
| NORP     | 0.91      | 0.92    | 0.91     | 1926    |
| ORDINAL  | 0.92      | 0.98    | 0.95     | 3891    |
| ORG      | 0.78      | 0.85    | 0.82     | 22184   |
| PERCENT  | 0.73      | 0.86    | 0.79     | 7286    |
| PERSON   | 0.89      | 0.93    | 0.91     | 16524   |
| PRODUCT  | 0.70      | 0.56    | 0.63     | 2071    |
| QUANTITY | 0.74      | 0.76    | 0.75     | 2588    |
| TIME     | 0.74      | 0.90    | 0.81     | 2390    |
| **micro avg** | 0.83  | 0.90    | 0.86     | 127262  |
| **macro avg** | 0.79  | 0.83    | 0.81     | 127262  |
| **weighted avg** | 0.84 | 0.90 | 0.86     | 127262  |

Performance on the [elNER dataset](https://github.com/nmpartzio/elNER) :

| Entity    | Precision | Recall | F1-score | Support |
|----------|-----------|--------|----------|---------|
| CARDINAL | 0.91      | 0.97    | 0.94     | 911     |
| DATE     | 0.92      | 0.92    | 0.92     | 838     |
| EVENT    | 0.57      | 0.57    | 0.57     | 130     |
| FAC      | 0.49      | 0.44    | 0.47     | 77      |
| GPE      | 0.84      | 0.95    | 0.89     | 826     |
| LOC      | 0.80      | 0.64    | 0.71     | 178     |
| MONEY    | 0.98      | 0.98    | 0.98     | 111     |
| NORP     | 0.89      | 0.92    | 0.91     | 141     |
| ORDINAL  | 0.95      | 0.93    | 0.94     | 172     |
| ORG      | 0.81      | 0.79    | 0.80     | 1388    |
| PERCENT  | 0.96      | 1.00    | 0.98     | 206     |
| PERSON   | 0.93      | 0.95    | 0.94     | 1051    |
| PRODUCT  | 0.61      | 0.37    | 0.46     | 83      |
| QUANTITY | 0.76      | 0.78    | 0.77     | 65      |
| TIME     | 0.90      | 0.92    | 0.91     | 137     |
| **micro avg** | 0.87  | 0.88    | 0.87     | 6314    |
| **macro avg** | 0.82  | 0.81    | 0.81     | 6314    |
| **weighted avg** | 0.87 | 0.88 | 0.87     | 6314    |


#### To use this model 
```
pip install transformers, torch
```

```python
from transformers import AutoModel

model = AutoModel.from_pretrained("katrjohn/GreekNewsBERT", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("nlpaueb/bert-base-greek-uncased-v1")
```

##### Example usage 
```python
import torch

# Classification label dictionary (reverse)
classification_label_dict_reverse = {
    0: "Αυτοκίνητο", 1: "Επιχειρήσεις και βιομηχανία", 2: "Έγκλημα και δικαιοσύνη",
    3: "Ειδήσεις για καταστροφές και έκτακτες ανάγκες", 4: "Οικονομικά και χρηματοοικονομικά", 5: "Εκπαίδευση",
    6: "Ψυχαγωγία και πολιτισμός", 7: "Περιβάλλον και κλίμα", 8: "Οικογένεια και σχέσεις",
    9: "Μόδα", 10: "Τρόφιμα και ποτά", 11: "Υγεία και ιατρική", 12: "Μεταφορές και υποδομές",
    13: "Ψυχική υγεία και ευεξία", 14: "Πολιτική και κυβέρνηση", 15: "Θρησκεία",
    16: "Αθλητισμός", 17: "Ταξίδια και αναψυχή", 18: "Τεχνολογία και επιστήμη"
}

ner_label_set = ["PAD", "O",
    "B-ORG", "I-ORG", "B-PERSON", "I-PERSON", "B-CARDINAL", "I-CARDINAL",
    "B-GPE", "I-GPE", "B-DATE", "I-DATE", "B-ORDINAL", "I-ORDINAL",
    "B-PERCENT", "I-PERCENT", "B-LOC", "I-LOC", "B-NORP", "I-NORP",
    "B-MONEY", "I-MONEY", "B-TIME", "I-TIME", "B-EVENT", "I-EVENT",
    "B-PRODUCT", "I-PRODUCT", "B-FAC", "I-FAC", "B-QUANTITY", "I-QUANTITY"
]
tag2idx = {t:i for i,t in enumerate(ner_label_set)}
idx2tag = {i:t for t,i in tag2idx.items()}

sentence = "Ο Κυριάκος Μητσοτάκης επισκέφθηκε τη Θεσσαλονίκη για τα εγκαίνια της ΔΕΘ."
inputs = tokenizer(sentence, return_tensors="pt")

with torch.no_grad():
    classification_logits, ner_logits = model(**inputs)

# Classification
classification_probs = torch.softmax(classification_logits, dim=-1)
predicted_class = torch.argmax(classification_probs, dim=-1).item()
predicted_class_label = classification_label_dict_reverse.get(predicted_class, "Unknown")

print(f"Predicted class index: {predicted_class}")
print(f"Predicted class label: {predicted_class_label}")

# NER
ner_predictions = torch.argmax(ner_logits, dim=-1).squeeze().tolist()
tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'].squeeze())

for token, pred_idx in zip(tokens, ner_predictions):
    tag = idx2tag.get(pred_idx, "O")
    if token in ["[CLS]", "[SEP]"]:
        tag = "O"
    print(f"{token}: {tag}")


```

Output:
```
Predicted class index: 14
Predicted class label: Πολιτική και κυβέρνηση
[CLS]: O
ο: O
κυριακος: B-PERSON
μητσοτακης: I-PERSON
επισκεφθηκε: O
τη: O
θεσσαλονικη: B-GPE
για: O
τα: O
εγκαινια: O
της: O
δεθ: B-EVENT
.: O
[SEP]: O

```

#### Author
This model has been released along side with the article: Named Entity Recognition and News Article Classification: A Lightweight Approach.

To use this model please cite the following:
```
@ARTICLE{11148234,
  author={Katranis, Ioannis and Troussas, Christos and Krouska, Akrivi and Mylonas, Phivos and Sgouropoulou, Cleo},
  journal={IEEE Access}, 
  title={Named Entity Recognition and News Article Classification: A Lightweight Approach}, 
  year={2025},
  volume={13},
  number={},
  pages={155031-155046},
  keywords={Accuracy;Transformers;Pipelines;Named entity recognition;Computational modeling;Vocabulary;Tagging;Real-time systems;Benchmark testing;Training;Distilled transformer;edge-deployable model;multiclass news-topic classification;named entity recognition},
  doi={10.1109/ACCESS.2025.3605709}}


```