Model Description

This model is finetuned version of XLM RoBERTa

Dataset

The model finetuned on the GreekNews-20k dataset.

Results

Perfomance on the GreekNews-20k dataset :

Class	Precision	Recall	F1-score	Support
Αυτοκίνητο	0.94	0.95	0.94	201
Επιχειρήσεις και βιομηχανία	0.67	0.78	0.72	369
Έγκλημα και δικαιοσύνη	0.90	0.86	0.88	314
Ειδήσεις για καταστροφές και έκτακτες ανάγκες	0.86	0.81	0.83	272
Οικονομικά και χρηματοοικονομικά	0.77	0.69	0.73	495
Εκπαίδευση	0.86	0.92	0.89	259
Ψυχαγωγία και πολιτισμός	0.83	0.76	0.80	251
Περιβάλλον και κλίμα	0.65	0.82	0.73	292
Οικογένεια και σχέσεις	0.87	0.84	0.85	294
Μόδα	0.93	0.92	0.93	259
Τρόφιμα και ποτά	0.67	0.89	0.77	262
Υγεία και ιατρική	0.79	0.73	0.76	346
Μεταφορές και υποδομές	0.87	0.80	0.83	321
Ψυχική υγεία και ευεξία	0.70	0.86	0.77	348
Πολιτική και κυβέρνηση	0.88	0.66	0.76	339
Θρησκεία	0.91	0.91	0.91	271
Αθλητισμός	1.00	0.97	0.98	212
Ταξίδια και αναψυχή	0.89	0.84	0.87	424
Τεχνολογία και επιστήμη	0.82	0.70	0.76	308
accuracy			0.82	5837
macro avg	0.83	0.83	0.83	5837
weighted avg	0.82	0.82	0.82	5837

Entity	Precision	Recall	F1-score	Support
CARDINAL	0.88	0.95	0.91	20881
DATE	0.87	0.93	0.90	12924
EVENT	0.64	0.68	0.66	1510
FAC	0.54	0.42	0.47	1680
GPE	0.86	0.92	0.89	12987
LOC	0.71	0.66	0.69	2783
MONEY	0.78	0.80	0.79	3158
NORP	0.89	0.88	0.89	1577
ORDINAL	0.92	0.97	0.94	3270
ORG	0.75	0.82	0.78	18385
PERCENT	0.80	0.77	0.78	5818
PERSON	0.88	0.91	0.90	13481
PRODUCT	0.64	0.53	0.58	1704
QUANTITY	0.72	0.72	0.72	2086
TIME	0.78	0.87	0.82	1932
micro avg	0.83	0.87	0.85	104176
macro avg	0.78	0.79	0.78	104176
weighted avg	0.82	0.87	0.84	104176

Performance on the elNER dataset :

Entity	Precision	Recall	F1-score	Support
CARDINAL	0.92	0.95	0.94	911
DATE	0.91	0.93	0.92	838
EVENT	0.44	0.58	0.50	130
FAC	0.56	0.38	0.45	77
GPE	0.79	0.91	0.85	826
LOC	0.76	0.65	0.70	178
MONEY	0.96	0.98	0.97	111
NORP	0.86	0.88	0.87	141
ORDINAL	0.96	0.92	0.94	172
ORG	0.78	0.75	0.76	1388
PERCENT	0.96	0.81	0.88	206
PERSON	0.92	0.96	0.94	1051
PRODUCT	0.56	0.43	0.49	83
QUANTITY	0.81	0.85	0.83	65
TIME	0.87	0.85	0.86	137
micro avg	0.85	0.86	0.85	6314
macro avg	0.80	0.79	0.79	6314
weighted avg	0.85	0.86	0.85	6314

To use this model

pip install transformers==4.46.1, torch

from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained("katrjohn/XLMRobertaGreekNews", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-base")

Example usage

import torch

# Classification label dictionary (reverse)
classification_label_dict_reverse = {
    0: "Αυτοκίνητο", 1: "Επιχειρήσεις και βιομηχανία", 2: "Έγκλημα και δικαιοσύνη",
    3: "Ειδήσεις για καταστροφές και έκτακτες ανάγκες", 4: "Οικονομικά και χρηματοοικονομικά", 5: "Εκπαίδευση",
    6: "Ψυχαγωγία και πολιτισμός", 7: "Περιβάλλον και κλίμα", 8: "Οικογένεια και σχέσεις",
    9: "Μόδα", 10: "Τρόφιμα και ποτά", 11: "Υγεία και ιατρική", 12: "Μεταφορές και υποδομές",
    13: "Ψυχική υγεία και ευεξία", 14: "Πολιτική και κυβέρνηση", 15: "Θρησκεία",
    16: "Αθλητισμός", 17: "Ταξίδια και αναψυχή", 18: "Τεχνολογία και επιστήμη"
}

ner_label_set = ["PAD", "O",
    "B-ORG", "I-ORG", "B-PERSON", "I-PERSON", "B-CARDINAL", "I-CARDINAL",
    "B-GPE", "I-GPE", "B-DATE", "I-DATE", "B-ORDINAL", "I-ORDINAL",
    "B-PERCENT", "I-PERCENT", "B-LOC", "I-LOC", "B-NORP", "I-NORP",
    "B-MONEY", "I-MONEY", "B-TIME", "I-TIME", "B-EVENT", "I-EVENT",
    "B-PRODUCT", "I-PRODUCT", "B-FAC", "I-FAC", "B-QUANTITY", "I-QUANTITY"
]
tag2idx = {t:i for i,t in enumerate(ner_label_set)}
idx2tag = {i:t for t,i in tag2idx.items()}

sentence = "Ο Κυριάκος Μητσοτάκης επισκέφθηκε τη Θεσσαλονίκη για τα εγκαίνια της ΔΕΘ."
inputs = tokenizer(sentence, return_tensors="pt")

with torch.no_grad():
    classification_logits, ner_logits = model(**inputs)

# Classification
classification_probs = torch.softmax(classification_logits, dim=-1)
predicted_class = torch.argmax(classification_probs, dim=-1).item()
predicted_class_label = classification_label_dict_reverse.get(predicted_class, "Unknown")

print(f"Predicted class index: {predicted_class}")
print(f"Predicted class label: {predicted_class_label}")

# NER
ner_predictions = torch.argmax(ner_logits, dim=-1).squeeze().tolist()
tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'].squeeze())

for token, pred_idx in zip(tokens, ner_predictions):
    tag = idx2tag.get(pred_idx, "O")
    print(f"{token}: {tag}")

Output:

Predicted class index: 6
Predicted class label: Ψυχαγωγία και πολιτισμός
<s>: O
▁Ο: O
▁Κυρι: B-PERSON
άκος: B-PERSON
▁Μητσοτάκη: I-PERSON
ς: O
▁επι: O
σκέ: O
φ: O
θηκε: O
▁τη: O
▁Θεσσαλονίκη: B-GPE
▁για: O
▁τα: O
▁εγκ: O
αί: O
νια: O
▁της: O
▁Δ: B-EVENT
ΕΘ: I-EVENT
.: O
</s>: O

Author

This model has been released along side with the article: Named Entity Recognition and News Article Classification: A Lightweight Approach.

To use this model please cite the following:

@ARTICLE{11148234,
  author={Katranis, Ioannis and Troussas, Christos and Krouska, Akrivi and Mylonas, Phivos and Sgouropoulou, Cleo},
  journal={IEEE Access}, 
  title={Named Entity Recognition and News Article Classification: A Lightweight Approach}, 
  year={2025},
  volume={13},
  number={},
  pages={155031-155046},
  keywords={Accuracy;Transformers;Pipelines;Named entity recognition;Computational modeling;Vocabulary;Tagging;Real-time systems;Benchmark testing;Training;Distilled transformer;edge-deployable model;multiclass news-topic classification;named entity recognition},
  doi={10.1109/ACCESS.2025.3605709}}

Downloads last month: 1

Safetensors

Model size

0.3B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for katrjohn/XLMRobertaGreekNews

Base model

FacebookAI/xlm-roberta-base

Finetuned

(4100)

this model

katrjohn
/

XLMRobertaGreekNews