|
|
--- |
|
|
license: mit |
|
|
datasets: |
|
|
- katrjohn/Greek-News-NER-Classif |
|
|
language: |
|
|
- el |
|
|
base_model: |
|
|
- nlpaueb/bert-base-greek-uncased-v1 |
|
|
tags: |
|
|
- classification |
|
|
- NER |
|
|
- NewsArticle |
|
|
--- |
|
|
|
|
|
# Model Description |
|
|
This model is finetuned version of [GreekBert](https://huggingface.co/nlpaueb/bert-base-greek-uncased-v1) |
|
|
|
|
|
## Dataset |
|
|
The model finetuned on the [GreekNews-20k](https://huggingface.co/datasets/katrjohn/GreekNews-20k) dataset. |
|
|
|
|
|
### Results |
|
|
|
|
|
Perfomance on the [GreekNews-20k dataset](https://huggingface.co/datasets/katrjohn/GreekNews-20k) : |
|
|
|
|
|
| Class | Precision | Recall | F1-score | Support | |
|
|
|----------------------------------------------------|-----------|--------|----------|---------| |
|
|
| Αυτοκίνητο | 0.94 | 0.95 | 0.94 | 201 | |
|
|
| Επιχειρήσεις και βιομηχανία | 0.73 | 0.78 | 0.75 | 369 | |
|
|
| Έγκλημα και δικαιοσύνη | 0.93 | 0.89 | 0.91 | 314 | |
|
|
| Ειδήσεις για καταστροφές και έκτακτες ανάγκες | 0.83 | 0.79 | 0.81 | 272 | |
|
|
| Οικονομικά και χρηματοοικονομικά | 0.78 | 0.74 | 0.76 | 495 | |
|
|
| Εκπαίδευση | 0.85 | 0.92 | 0.88 | 259 | |
|
|
| Ψυχαγωγία και πολιτισμός | 0.81 | 0.85 | 0.83 | 251 | |
|
|
| Περιβάλλον και κλίμα | 0.81 | 0.75 | 0.78 | 292 | |
|
|
| Οικογένεια και σχέσεις | 0.87 | 0.89 | 0.88 | 294 | |
|
|
| Μόδα | 0.96 | 0.93 | 0.94 | 259 | |
|
|
| Τρόφιμα και ποτά | 0.69 | 0.90 | 0.78 | 262 | |
|
|
| Υγεία και ιατρική | 0.76 | 0.71 | 0.73 | 346 | |
|
|
| Μεταφορές και υποδομές | 0.78 | 0.86 | 0.82 | 321 | |
|
|
| Ψυχική υγεία και ευεξία | 0.84 | 0.79 | 0.81 | 348 | |
|
|
| Πολιτική και κυβέρνηση | 0.89 | 0.69 | 0.78 | 339 | |
|
|
| Θρησκεία | 0.89 | 0.95 | 0.92 | 271 | |
|
|
| Αθλητισμός | 1.00 | 0.98 | 0.99 | 212 | |
|
|
| Ταξίδια και αναψυχή | 0.88 | 0.88 | 0.88 | 424 | |
|
|
| Τεχνολογία και επιστήμη | 0.77 | 0.78 | 0.78 | 308 | |
|
|
| **accuracy** | | | 0.83 | 5837 | |
|
|
| **macro avg** | 0.84 | 0.84 | 0.84 | 5837 | |
|
|
| **weighted avg** | 0.83 | 0.83 | 0.83 | 5837 | |
|
|
|
|
|
| Entity | Precision | Recall | F1-score | Support | |
|
|
|----------|-----------|--------|----------|---------| |
|
|
| CARDINAL | 0.87 | 0.97 | 0.91 | 25656 | |
|
|
| DATE | 0.89 | 0.92 | 0.91 | 15469 | |
|
|
| EVENT | 0.71 | 0.73 | 0.72 | 1720 | |
|
|
| FAC | 0.53 | 0.60 | 0.56 | 2118 | |
|
|
| GPE | 0.88 | 0.95 | 0.91 | 16010 | |
|
|
| LOC | 0.82 | 0.70 | 0.75 | 3547 | |
|
|
| MONEY | 0.78 | 0.83 | 0.80 | 3882 | |
|
|
| NORP | 0.91 | 0.92 | 0.91 | 1926 | |
|
|
| ORDINAL | 0.92 | 0.98 | 0.95 | 3891 | |
|
|
| ORG | 0.78 | 0.85 | 0.82 | 22184 | |
|
|
| PERCENT | 0.73 | 0.86 | 0.79 | 7286 | |
|
|
| PERSON | 0.89 | 0.93 | 0.91 | 16524 | |
|
|
| PRODUCT | 0.70 | 0.56 | 0.63 | 2071 | |
|
|
| QUANTITY | 0.74 | 0.76 | 0.75 | 2588 | |
|
|
| TIME | 0.74 | 0.90 | 0.81 | 2390 | |
|
|
| **micro avg** | 0.83 | 0.90 | 0.86 | 127262 | |
|
|
| **macro avg** | 0.79 | 0.83 | 0.81 | 127262 | |
|
|
| **weighted avg** | 0.84 | 0.90 | 0.86 | 127262 | |
|
|
|
|
|
Performance on the [elNER dataset](https://github.com/nmpartzio/elNER) : |
|
|
|
|
|
| Entity | Precision | Recall | F1-score | Support | |
|
|
|----------|-----------|--------|----------|---------| |
|
|
| CARDINAL | 0.91 | 0.97 | 0.94 | 911 | |
|
|
| DATE | 0.92 | 0.92 | 0.92 | 838 | |
|
|
| EVENT | 0.57 | 0.57 | 0.57 | 130 | |
|
|
| FAC | 0.49 | 0.44 | 0.47 | 77 | |
|
|
| GPE | 0.84 | 0.95 | 0.89 | 826 | |
|
|
| LOC | 0.80 | 0.64 | 0.71 | 178 | |
|
|
| MONEY | 0.98 | 0.98 | 0.98 | 111 | |
|
|
| NORP | 0.89 | 0.92 | 0.91 | 141 | |
|
|
| ORDINAL | 0.95 | 0.93 | 0.94 | 172 | |
|
|
| ORG | 0.81 | 0.79 | 0.80 | 1388 | |
|
|
| PERCENT | 0.96 | 1.00 | 0.98 | 206 | |
|
|
| PERSON | 0.93 | 0.95 | 0.94 | 1051 | |
|
|
| PRODUCT | 0.61 | 0.37 | 0.46 | 83 | |
|
|
| QUANTITY | 0.76 | 0.78 | 0.77 | 65 | |
|
|
| TIME | 0.90 | 0.92 | 0.91 | 137 | |
|
|
| **micro avg** | 0.87 | 0.88 | 0.87 | 6314 | |
|
|
| **macro avg** | 0.82 | 0.81 | 0.81 | 6314 | |
|
|
| **weighted avg** | 0.87 | 0.88 | 0.87 | 6314 | |
|
|
|
|
|
|
|
|
|
|
|
#### To use this model |
|
|
``` |
|
|
pip install transformers, torch |
|
|
``` |
|
|
|
|
|
```python |
|
|
from transformers import AutoModel |
|
|
|
|
|
model = AutoModel.from_pretrained("katrjohn/GreekNewsBERT", trust_remote_code=True) |
|
|
tokenizer = AutoTokenizer.from_pretrained("nlpaueb/bert-base-greek-uncased-v1") |
|
|
``` |
|
|
|
|
|
##### Example usage |
|
|
```python |
|
|
import torch |
|
|
|
|
|
# Classification label dictionary (reverse) |
|
|
classification_label_dict_reverse = { |
|
|
0: "Αυτοκίνητο", 1: "Επιχειρήσεις και βιομηχανία", 2: "Έγκλημα και δικαιοσύνη", |
|
|
3: "Ειδήσεις για καταστροφές και έκτακτες ανάγκες", 4: "Οικονομικά και χρηματοοικονομικά", 5: "Εκπαίδευση", |
|
|
6: "Ψυχαγωγία και πολιτισμός", 7: "Περιβάλλον και κλίμα", 8: "Οικογένεια και σχέσεις", |
|
|
9: "Μόδα", 10: "Τρόφιμα και ποτά", 11: "Υγεία και ιατρική", 12: "Μεταφορές και υποδομές", |
|
|
13: "Ψυχική υγεία και ευεξία", 14: "Πολιτική και κυβέρνηση", 15: "Θρησκεία", |
|
|
16: "Αθλητισμός", 17: "Ταξίδια και αναψυχή", 18: "Τεχνολογία και επιστήμη" |
|
|
} |
|
|
|
|
|
ner_label_set = ["PAD", "O", |
|
|
"B-ORG", "I-ORG", "B-PERSON", "I-PERSON", "B-CARDINAL", "I-CARDINAL", |
|
|
"B-GPE", "I-GPE", "B-DATE", "I-DATE", "B-ORDINAL", "I-ORDINAL", |
|
|
"B-PERCENT", "I-PERCENT", "B-LOC", "I-LOC", "B-NORP", "I-NORP", |
|
|
"B-MONEY", "I-MONEY", "B-TIME", "I-TIME", "B-EVENT", "I-EVENT", |
|
|
"B-PRODUCT", "I-PRODUCT", "B-FAC", "I-FAC", "B-QUANTITY", "I-QUANTITY" |
|
|
] |
|
|
tag2idx = {t:i for i,t in enumerate(ner_label_set)} |
|
|
idx2tag = {i:t for t,i in tag2idx.items()} |
|
|
|
|
|
sentence = "Ο Κυριάκος Μητσοτάκης επισκέφθηκε τη Θεσσαλονίκη για τα εγκαίνια της ΔΕΘ." |
|
|
inputs = tokenizer(sentence, return_tensors="pt") |
|
|
|
|
|
with torch.no_grad(): |
|
|
classification_logits, ner_logits = model(**inputs) |
|
|
|
|
|
# Classification |
|
|
classification_probs = torch.softmax(classification_logits, dim=-1) |
|
|
predicted_class = torch.argmax(classification_probs, dim=-1).item() |
|
|
predicted_class_label = classification_label_dict_reverse.get(predicted_class, "Unknown") |
|
|
|
|
|
print(f"Predicted class index: {predicted_class}") |
|
|
print(f"Predicted class label: {predicted_class_label}") |
|
|
|
|
|
# NER |
|
|
ner_predictions = torch.argmax(ner_logits, dim=-1).squeeze().tolist() |
|
|
tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'].squeeze()) |
|
|
|
|
|
for token, pred_idx in zip(tokens, ner_predictions): |
|
|
tag = idx2tag.get(pred_idx, "O") |
|
|
if token in ["[CLS]", "[SEP]"]: |
|
|
tag = "O" |
|
|
print(f"{token}: {tag}") |
|
|
|
|
|
|
|
|
``` |
|
|
|
|
|
Output: |
|
|
``` |
|
|
Predicted class index: 14 |
|
|
Predicted class label: Πολιτική και κυβέρνηση |
|
|
[CLS]: O |
|
|
ο: O |
|
|
κυριακος: B-PERSON |
|
|
μητσοτακης: I-PERSON |
|
|
επισκεφθηκε: O |
|
|
τη: O |
|
|
θεσσαλονικη: B-GPE |
|
|
για: O |
|
|
τα: O |
|
|
εγκαινια: O |
|
|
της: O |
|
|
δεθ: B-EVENT |
|
|
.: O |
|
|
[SEP]: O |
|
|
|
|
|
``` |
|
|
|
|
|
#### Author |
|
|
This model has been released along side with the article: Named Entity Recognition and News Article Classification: A Lightweight Approach. |
|
|
|
|
|
To use this model please cite the following: |
|
|
``` |
|
|
@ARTICLE{11148234, |
|
|
author={Katranis, Ioannis and Troussas, Christos and Krouska, Akrivi and Mylonas, Phivos and Sgouropoulou, Cleo}, |
|
|
journal={IEEE Access}, |
|
|
title={Named Entity Recognition and News Article Classification: A Lightweight Approach}, |
|
|
year={2025}, |
|
|
volume={13}, |
|
|
number={}, |
|
|
pages={155031-155046}, |
|
|
keywords={Accuracy;Transformers;Pipelines;Named entity recognition;Computational modeling;Vocabulary;Tagging;Real-time systems;Benchmark testing;Training;Distilled transformer;edge-deployable model;multiclass news-topic classification;named entity recognition}, |
|
|
doi={10.1109/ACCESS.2025.3605709}} |
|
|
|
|
|
|
|
|
``` |
|
|
|