File size: 9,383 Bytes

---
license: mit
datasets:
- katrjohn/Greek-News-NER-Classif
language:
- el
base_model:
- EftychiaKarav/DistilGREEK-BERT
- nlpaueb/bert-base-greek-uncased-v1
tags:
- NewsArticle
- Classification
- NER
---


# Model Description
This model is finetuned version of [DistilGreekBert](https://huggingface.co/EftychiaKarav/DistilGREEK-BERT)

## Dataset
The model finetuned on the [GreekNews-20k](https://huggingface.co/datasets/katrjohn/GreekNews-20k) dataset.

### Results

Perfomance on the [GreekNews-20k dataset](https://huggingface.co/datasets/katrjohn/GreekNews-20k) :

| Class                                           | Precision | Recall | F1-Score | Support |
|-------------------------------------------------|-----------|--------|----------|---------|
| Αυτοκίνητο                                      | 0.96      | 0.97   | 0.96     | 201     |
| Επιχειρήσεις και βιομηχανία                     | 0.75      | 0.72   | 0.74     | 369     |
| Έγκλημα και δικαιοσύνη                          | 0.87      | 0.91   | 0.89     | 314     |
| Ειδήσεις για καταστροφές και έκτακτες ανάγκες   | 0.84      | 0.76   | 0.80     | 272     |
| Οικονομικά και χρηματοοικονομικά                | 0.77      | 0.78   | 0.77     | 495     |
| Εκπαίδευση                                      | 0.92      | 0.90   | 0.91     | 259     |
| Ψυχαγωγία και πολιτισμός                       | 0.86      | 0.80   | 0.83     | 251     |
| Περιβάλλον και κλίμα                           | 0.69      | 0.84   | 0.76     | 292     |
| Οικογένεια και σχέσεις                         | 0.85      | 0.82   | 0.83     | 294     |
| Μόδα                                            | 0.93      | 0.92   | 0.92     | 259     |
| Τρόφιμα και ποτά                               | 0.69      | 0.86   | 0.77     | 262     |
| Υγεία και ιατρική                              | 0.78      | 0.66   | 0.72     | 346     |
| Μεταφορές και υποδομές                         | 0.80      | 0.88   | 0.84     | 321     |
| Ψυχική υγεία και ευεξία                        | 0.80      | 0.75   | 0.78     | 348     |
| Πολιτική και κυβέρνηση                         | 0.86      | 0.75   | 0.80     | 339     |
| Θρησκεία                                        | 0.95      | 0.91   | 0.93     | 271     |
| Αθλητισμός                                      | 0.98      | 0.98   | 0.98     | 212     |
| Ταξίδια και αναψυχή                            | 0.84      | 0.89   | 0.86     | 424     |
| Τεχνολογία και επιστήμη                        | 0.75      | 0.73   | 0.74     | 308     |
| **accuracy**                                      |           |        | 0.82     | 5837    |
| **macro avg**                                     | 0.84      | 0.83   | 0.83     | 5837    |
| **weighted avg**                                  | 0.83      | 0.82   | 0.82     | 5837    |

| Entity    | Precision | Recall | F1-Score | Support |
|-----------|-----------|--------|----------|---------|
| CARDINAL  | 0.85      | 0.92   | 0.88     | 25656   |
| DATE      | 0.84      | 0.91   | 0.88     | 15469   |
| EVENT     | 0.52      | 0.68   | 0.59     | 1720    |
| FAC       | 0.42      | 0.56   | 0.48     | 2118    |
| GPE       | 0.86      | 0.93   | 0.89     | 16010   |
| LOC       | 0.66      | 0.68   | 0.67     | 3547    |
| MONEY     | 0.70      | 0.74   | 0.72     | 3882    |
| NORP      | 0.82      | 0.93   | 0.87     | 1926    |
| ORDINAL   | 0.90      | 0.98   | 0.94     | 3891    |
| ORG       | 0.69      | 0.81   | 0.75     | 22184   |
| PERCENT   | 0.71      | 0.76   | 0.73     | 7286    |
| PERSON    | 0.86      | 0.92   | 0.89     | 16524   |
| PRODUCT   | 0.50      | 0.58   | 0.54     | 2071    |
| QUANTITY  | 0.61      | 0.68   | 0.64     | 2588    |
| TIME      | 0.68      | 0.76   | 0.72     | 2390    |
| **micro avg** | 0.78  | 0.86    | 0.82     | 127262  |
| **macro avg** | 0.71  | 0.79    | 0.75     | 127262  |
| **weighted avg** | 0.78 | 0.86 | 0.82     | 127262  |

Performance on the [elNER dataset](https://github.com/nmpartzio/elNER) :

| Entity    | Precision | Recall | F1-Score | Support |
|-----------|-----------|--------|----------|---------|
| CARDINAL  | 0.91      | 0.84   | 0.87     | 911     |
| DATE      | 0.90      | 0.89   | 0.90     | 838     |
| EVENT     | 0.36      | 0.46   | 0.41     | 130     |
| FAC       | 0.21      | 0.18   | 0.19     | 77      |
| GPE       | 0.80      | 0.93   | 0.86     | 826     |
| LOC       | 0.48      | 0.64   | 0.55     | 178     |
| MONEY     | 0.90      | 0.94   | 0.92     | 111     |
| NORP      | 0.88      | 0.84   | 0.86     | 141     |
| ORDINAL   | 0.95      | 0.92   | 0.93     | 172     |
| ORG       | 0.72      | 0.73   | 0.73     | 1388    |
| PERCENT   | 0.93      | 0.91   | 0.92     | 206     |
| PERSON    | 0.89      | 0.92   | 0.90     | 1051    |
| PRODUCT   | 0.54      | 0.49   | 0.52     | 83      |
| QUANTITY  | 0.67      | 0.72   | 0.70     | 65      |
| TIME      | 0.79      | 0.75   | 0.77     | 137     |
| **micro avg** | 0.80  | 0.83    | 0.81     | 6314    |
| **macro avg** | 0.73  | 0.75    | 0.74     | 6314    |
| **weighted avg** | 0.81 | 0.83 | 0.82     | 6314    |



#### To use this model 
```
pip install transformers, torch
```

```python
from transformers import AutoModel

model = AutoModel.from_pretrained("katrjohn/DistilGreekNewsBert", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("EftychiaKarav/DistilGREEK-BERT")
```

##### Example usage 
```python
import torch

# Classification label dictionary (reverse)
classification_label_dict_reverse = {
    0: "Αυτοκίνητο", 1: "Επιχειρήσεις και βιομηχανία", 2: "Έγκλημα και δικαιοσύνη",
    3: "Ειδήσεις για καταστροφές και έκτακτες ανάγκες", 4: "Οικονομικά και χρηματοοικονομικά", 5: "Εκπαίδευση",
    6: "Ψυχαγωγία και πολιτισμός", 7: "Περιβάλλον και κλίμα", 8: "Οικογένεια και σχέσεις",
    9: "Μόδα", 10: "Τρόφιμα και ποτά", 11: "Υγεία και ιατρική", 12: "Μεταφορές και υποδομές",
    13: "Ψυχική υγεία και ευεξία", 14: "Πολιτική και κυβέρνηση", 15: "Θρησκεία",
    16: "Αθλητισμός", 17: "Ταξίδια και αναψυχή", 18: "Τεχνολογία και επιστήμη"
}

ner_label_set = ["PAD", "O",
    "B-ORG", "I-ORG", "B-PERSON", "I-PERSON", "B-CARDINAL", "I-CARDINAL",
    "B-GPE", "I-GPE", "B-DATE", "I-DATE", "B-ORDINAL", "I-ORDINAL",
    "B-PERCENT", "I-PERCENT", "B-LOC", "I-LOC", "B-NORP", "I-NORP",
    "B-MONEY", "I-MONEY", "B-TIME", "I-TIME", "B-EVENT", "I-EVENT",
    "B-PRODUCT", "I-PRODUCT", "B-FAC", "I-FAC", "B-QUANTITY", "I-QUANTITY"
]
tag2idx = {t:i for i,t in enumerate(ner_label_set)}
idx2tag = {i:t for t,i in tag2idx.items()}

sentence = "Ο Κυριάκος Μητσοτάκης επισκέφθηκε τη Θεσσαλονίκη για τα εγκαίνια της ΔΕΘ."
inputs = tokenizer(sentence, return_tensors="pt")

with torch.no_grad():
    classification_logits, ner_logits = model(**inputs)

# Classification
classification_probs = torch.softmax(classification_logits, dim=-1)
predicted_class = torch.argmax(classification_probs, dim=-1).item()
predicted_class_label = classification_label_dict_reverse.get(predicted_class, "Unknown")

print(f"Predicted class index: {predicted_class}")
print(f"Predicted class label: {predicted_class_label}")

# NER
ner_predictions = torch.argmax(ner_logits, dim=-1).squeeze().tolist()
tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'].squeeze())

for token, pred_idx in zip(tokens, ner_predictions):
    tag = idx2tag.get(pred_idx, "O")
    if token in ["[CLS]", "[SEP]"]:
        tag = "O"
    print(f"{token}: {tag}")


```

Output:
```
Predicted class index: 14
Predicted class label: Πολιτική και κυβέρνηση
[CLS]: O
ο: O
κυριακος: B-PERSON
μητσοτακης: I-PERSON
επισκεφθηκε: O
τη: O
θεσσαλονικη: B-GPE
για: O
τα: O
εγκαινια: O
της: O
δεθ: B-EVENT
.: O
[SEP]: O

```


#### Author
This model has been released along side with the article: Named Entity Recognition and News Article Classification: A Lightweight Approach.

To use this model please cite the following:
```
@ARTICLE{11148234,
  author={Katranis, Ioannis and Troussas, Christos and Krouska, Akrivi and Mylonas, Phivos and Sgouropoulou, Cleo},
  journal={IEEE Access}, 
  title={Named Entity Recognition and News Article Classification: A Lightweight Approach}, 
  year={2025},
  volume={13},
  number={},
  pages={155031-155046},
  keywords={Accuracy;Transformers;Pipelines;Named entity recognition;Computational modeling;Vocabulary;Tagging;Real-time systems;Benchmark testing;Training;Distilled transformer;edge-deployable model;multiclass news-topic classification;named entity recognition},
  doi={10.1109/ACCESS.2025.3605709}}

```