GaborMadarasz
/

ModernBERT-base-hungarian-POS_long

Token Classification

Part-of-speech tagging

Model card Files Files and versions

GaborMadarasz commited on Oct 14, 2025

Commit

02cf25f

·

verified ·

1 Parent(s): e624e5c

Update README.md

Files changed (1) hide show

README.md +33 -12

README.md CHANGED Viewed

@@ -1,29 +1,39 @@
 ---
 library_name: transformers
-tags: []
 ---
 # Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
 This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
 - **Funded by [optional]:** [More Information Needed]
 - **Shared by [optional]:** [More Information Needed]
 - **Model type:** [More Information Needed]
 - **Language(s) (NLP):** [More Information Needed]
 - **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
 ### Model Sources [optional]
@@ -77,9 +87,21 @@ Use the code below to get started with the model.
 ### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
 ### Training Procedure
@@ -120,9 +142,8 @@ Use the code below to get started with the model.
 #### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
 ### Results
@@ -196,4 +217,4 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
 ## Model Card Contact
-[More Information Needed]

 ---
 library_name: transformers
+tags:
+- POS
+- Part-of-speech tagging
+license: apache-2.0
+language:
+- hu
+base_model:
+- GaborMadarasz/ModernBERT-base-hungarian
+pipeline_tag: token-classification
 ---
 # Model Card for Model ID
+Hungarian long-context Part-of-speech tagger ModernBERT-base.
+### Model Description
+The model performs POS tagging on long Hungarian texts. (8k context-window)
+labels: ['ADJ', 'ADP', 'ADV', 'AUX', 'CCONJ', 'DET', 'INTJ', 'NOUN', 'NUM', 'PART', 'PRON', 'PROPN', 'PUNCT', 'SCONJ', 'VERB', 'X']
+Accuracy on hu_szeged-ud-test (token-level): 88.12%
 This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
+- **Developed by:** Gábor Madarász
 - **Funded by [optional]:** [More Information Needed]
 - **Shared by [optional]:** [More Information Needed]
 - **Model type:** [More Information Needed]
 - **Language(s) (NLP):** [More Information Needed]
 - **License:** [More Information Needed]
+- **Finetuned from model [optional]:** GaborMadarasz/ModernBERT-base-hungarian
 ### Model Sources [optional]
 ### Training Data
+#### Phase-1 finetune
+UD Hungarian Szeged:
+https://universaldependencies.org/treebanks/hu_szeged/index.html
+POS tagging performed with huSapcy (hu_core_news_lg) on Hungarian Wikipedia.
+#### Phase-2 finetune
+POS tagging performed with phase-1 fine-tuned ModernBERT on a subset of opensubtitles.
+#### Phase-3 finetune
+POS tagging long texts (6k-8k tokens) with stanza
 ### Training Procedure
 #### Metrics
+Accuracy:
 ### Results
 ## Model Card Contact
+gabor.madarasz@gmail.com