--- license: mit datasets: - katrjohn/Greek-News-NER-Classif language: - el base_model: - nlpaueb/bert-base-greek-uncased-v1 tags: - classification - NER - NewsArticle --- # Model Description This model is finetuned version of [GreekBert](https://huggingface.co/nlpaueb/bert-base-greek-uncased-v1) ## Dataset The model finetuned on the [GreekNews-20k](https://huggingface.co/datasets/katrjohn/GreekNews-20k) dataset. ### Results Perfomance on the [GreekNews-20k dataset](https://huggingface.co/datasets/katrjohn/GreekNews-20k) : | Class | Precision | Recall | F1-score | Support | |----------------------------------------------------|-----------|--------|----------|---------| | Αυτοκίνητο | 0.94 | 0.95 | 0.94 | 201 | | Επιχειρήσεις και βιομηχανία | 0.73 | 0.78 | 0.75 | 369 | | Έγκλημα και δικαιοσύνη | 0.93 | 0.89 | 0.91 | 314 | | Ειδήσεις για καταστροφές και έκτακτες ανάγκες | 0.83 | 0.79 | 0.81 | 272 | | Οικονομικά και χρηματοοικονομικά | 0.78 | 0.74 | 0.76 | 495 | | Εκπαίδευση | 0.85 | 0.92 | 0.88 | 259 | | Ψυχαγωγία και πολιτισμός | 0.81 | 0.85 | 0.83 | 251 | | Περιβάλλον και κλίμα | 0.81 | 0.75 | 0.78 | 292 | | Οικογένεια και σχέσεις | 0.87 | 0.89 | 0.88 | 294 | | Μόδα | 0.96 | 0.93 | 0.94 | 259 | | Τρόφιμα και ποτά | 0.69 | 0.90 | 0.78 | 262 | | Υγεία και ιατρική | 0.76 | 0.71 | 0.73 | 346 | | Μεταφορές και υποδομές | 0.78 | 0.86 | 0.82 | 321 | | Ψυχική υγεία και ευεξία | 0.84 | 0.79 | 0.81 | 348 | | Πολιτική και κυβέρνηση | 0.89 | 0.69 | 0.78 | 339 | | Θρησκεία | 0.89 | 0.95 | 0.92 | 271 | | Αθλητισμός | 1.00 | 0.98 | 0.99 | 212 | | Ταξίδια και αναψυχή | 0.88 | 0.88 | 0.88 | 424 | | Τεχνολογία και επιστήμη | 0.77 | 0.78 | 0.78 | 308 | | **accuracy** | | | 0.83 | 5837 | | **macro avg** | 0.84 | 0.84 | 0.84 | 5837 | | **weighted avg** | 0.83 | 0.83 | 0.83 | 5837 | | Entity | Precision | Recall | F1-score | Support | |----------|-----------|--------|----------|---------| | CARDINAL | 0.87 | 0.97 | 0.91 | 25656 | | DATE | 0.89 | 0.92 | 0.91 | 15469 | | EVENT | 0.71 | 0.73 | 0.72 | 1720 | | FAC | 0.53 | 0.60 | 0.56 | 2118 | | GPE | 0.88 | 0.95 | 0.91 | 16010 | | LOC | 0.82 | 0.70 | 0.75 | 3547 | | MONEY | 0.78 | 0.83 | 0.80 | 3882 | | NORP | 0.91 | 0.92 | 0.91 | 1926 | | ORDINAL | 0.92 | 0.98 | 0.95 | 3891 | | ORG | 0.78 | 0.85 | 0.82 | 22184 | | PERCENT | 0.73 | 0.86 | 0.79 | 7286 | | PERSON | 0.89 | 0.93 | 0.91 | 16524 | | PRODUCT | 0.70 | 0.56 | 0.63 | 2071 | | QUANTITY | 0.74 | 0.76 | 0.75 | 2588 | | TIME | 0.74 | 0.90 | 0.81 | 2390 | | **micro avg** | 0.83 | 0.90 | 0.86 | 127262 | | **macro avg** | 0.79 | 0.83 | 0.81 | 127262 | | **weighted avg** | 0.84 | 0.90 | 0.86 | 127262 | Performance on the [elNER dataset](https://github.com/nmpartzio/elNER) : | Entity | Precision | Recall | F1-score | Support | |----------|-----------|--------|----------|---------| | CARDINAL | 0.91 | 0.97 | 0.94 | 911 | | DATE | 0.92 | 0.92 | 0.92 | 838 | | EVENT | 0.57 | 0.57 | 0.57 | 130 | | FAC | 0.49 | 0.44 | 0.47 | 77 | | GPE | 0.84 | 0.95 | 0.89 | 826 | | LOC | 0.80 | 0.64 | 0.71 | 178 | | MONEY | 0.98 | 0.98 | 0.98 | 111 | | NORP | 0.89 | 0.92 | 0.91 | 141 | | ORDINAL | 0.95 | 0.93 | 0.94 | 172 | | ORG | 0.81 | 0.79 | 0.80 | 1388 | | PERCENT | 0.96 | 1.00 | 0.98 | 206 | | PERSON | 0.93 | 0.95 | 0.94 | 1051 | | PRODUCT | 0.61 | 0.37 | 0.46 | 83 | | QUANTITY | 0.76 | 0.78 | 0.77 | 65 | | TIME | 0.90 | 0.92 | 0.91 | 137 | | **micro avg** | 0.87 | 0.88 | 0.87 | 6314 | | **macro avg** | 0.82 | 0.81 | 0.81 | 6314 | | **weighted avg** | 0.87 | 0.88 | 0.87 | 6314 | #### To use this model ``` pip install transformers, torch ``` ```python from transformers import AutoModel model = AutoModel.from_pretrained("katrjohn/GreekNewsBERT", trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained("nlpaueb/bert-base-greek-uncased-v1") ``` ##### Example usage ```python import torch # Classification label dictionary (reverse) classification_label_dict_reverse = { 0: "Αυτοκίνητο", 1: "Επιχειρήσεις και βιομηχανία", 2: "Έγκλημα και δικαιοσύνη", 3: "Ειδήσεις για καταστροφές και έκτακτες ανάγκες", 4: "Οικονομικά και χρηματοοικονομικά", 5: "Εκπαίδευση", 6: "Ψυχαγωγία και πολιτισμός", 7: "Περιβάλλον και κλίμα", 8: "Οικογένεια και σχέσεις", 9: "Μόδα", 10: "Τρόφιμα και ποτά", 11: "Υγεία και ιατρική", 12: "Μεταφορές και υποδομές", 13: "Ψυχική υγεία και ευεξία", 14: "Πολιτική και κυβέρνηση", 15: "Θρησκεία", 16: "Αθλητισμός", 17: "Ταξίδια και αναψυχή", 18: "Τεχνολογία και επιστήμη" } ner_label_set = ["PAD", "O", "B-ORG", "I-ORG", "B-PERSON", "I-PERSON", "B-CARDINAL", "I-CARDINAL", "B-GPE", "I-GPE", "B-DATE", "I-DATE", "B-ORDINAL", "I-ORDINAL", "B-PERCENT", "I-PERCENT", "B-LOC", "I-LOC", "B-NORP", "I-NORP", "B-MONEY", "I-MONEY", "B-TIME", "I-TIME", "B-EVENT", "I-EVENT", "B-PRODUCT", "I-PRODUCT", "B-FAC", "I-FAC", "B-QUANTITY", "I-QUANTITY" ] tag2idx = {t:i for i,t in enumerate(ner_label_set)} idx2tag = {i:t for t,i in tag2idx.items()} sentence = "Ο Κυριάκος Μητσοτάκης επισκέφθηκε τη Θεσσαλονίκη για τα εγκαίνια της ΔΕΘ." inputs = tokenizer(sentence, return_tensors="pt") with torch.no_grad(): classification_logits, ner_logits = model(**inputs) # Classification classification_probs = torch.softmax(classification_logits, dim=-1) predicted_class = torch.argmax(classification_probs, dim=-1).item() predicted_class_label = classification_label_dict_reverse.get(predicted_class, "Unknown") print(f"Predicted class index: {predicted_class}") print(f"Predicted class label: {predicted_class_label}") # NER ner_predictions = torch.argmax(ner_logits, dim=-1).squeeze().tolist() tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'].squeeze()) for token, pred_idx in zip(tokens, ner_predictions): tag = idx2tag.get(pred_idx, "O") if token in ["[CLS]", "[SEP]"]: tag = "O" print(f"{token}: {tag}") ``` Output: ``` Predicted class index: 14 Predicted class label: Πολιτική και κυβέρνηση [CLS]: O ο: O κυριακος: B-PERSON μητσοτακης: I-PERSON επισκεφθηκε: O τη: O θεσσαλονικη: B-GPE για: O τα: O εγκαινια: O της: O δεθ: B-EVENT .: O [SEP]: O ``` #### Author This model has been released along side with the article: Named Entity Recognition and News Article Classification: A Lightweight Approach. To use this model please cite the following: ``` @ARTICLE{11148234, author={Katranis, Ioannis and Troussas, Christos and Krouska, Akrivi and Mylonas, Phivos and Sgouropoulou, Cleo}, journal={IEEE Access}, title={Named Entity Recognition and News Article Classification: A Lightweight Approach}, year={2025}, volume={13}, number={}, pages={155031-155046}, keywords={Accuracy;Transformers;Pipelines;Named entity recognition;Computational modeling;Vocabulary;Tagging;Real-time systems;Benchmark testing;Training;Distilled transformer;edge-deployable model;multiclass news-topic classification;named entity recognition}, doi={10.1109/ACCESS.2025.3605709}} ```