GaborMadarasz commited on
Commit
02cf25f
·
verified ·
1 Parent(s): e624e5c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -12
README.md CHANGED
@@ -1,29 +1,39 @@
1
  ---
2
  library_name: transformers
3
- tags: []
 
 
 
 
 
 
 
 
4
  ---
5
 
6
  # Model Card for Model ID
7
 
8
- <!-- Provide a quick summary of what the model is/does. -->
9
 
10
 
 
11
 
12
- ## Model Details
13
 
14
- ### Model Description
 
 
15
 
16
- <!-- Provide a longer summary of what this model is. -->
17
 
18
  This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
 
20
- - **Developed by:** [More Information Needed]
21
  - **Funded by [optional]:** [More Information Needed]
22
  - **Shared by [optional]:** [More Information Needed]
23
  - **Model type:** [More Information Needed]
24
  - **Language(s) (NLP):** [More Information Needed]
25
  - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
 
28
  ### Model Sources [optional]
29
 
@@ -77,9 +87,21 @@ Use the code below to get started with the model.
77
 
78
  ### Training Data
79
 
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
81
 
82
- [More Information Needed]
83
 
84
  ### Training Procedure
85
 
@@ -120,9 +142,8 @@ Use the code below to get started with the model.
120
 
121
  #### Metrics
122
 
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
 
125
- [More Information Needed]
126
 
127
  ### Results
128
 
@@ -196,4 +217,4 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
196
 
197
  ## Model Card Contact
198
 
199
- [More Information Needed]
 
1
  ---
2
  library_name: transformers
3
+ tags:
4
+ - POS
5
+ - Part-of-speech tagging
6
+ license: apache-2.0
7
+ language:
8
+ - hu
9
+ base_model:
10
+ - GaborMadarasz/ModernBERT-base-hungarian
11
+ pipeline_tag: token-classification
12
  ---
13
 
14
  # Model Card for Model ID
15
 
16
+ Hungarian long-context Part-of-speech tagger ModernBERT-base.
17
 
18
 
19
+ ### Model Description
20
 
21
+ The model performs POS tagging on long Hungarian texts. (8k context-window)
22
 
23
+ labels: ['ADJ', 'ADP', 'ADV', 'AUX', 'CCONJ', 'DET', 'INTJ', 'NOUN', 'NUM', 'PART', 'PRON', 'PROPN', 'PUNCT', 'SCONJ', 'VERB', 'X']
24
+
25
+ Accuracy on hu_szeged-ud-test (token-level): 88.12%
26
 
 
27
 
28
  This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
29
 
30
+ - **Developed by:** Gábor Madarász
31
  - **Funded by [optional]:** [More Information Needed]
32
  - **Shared by [optional]:** [More Information Needed]
33
  - **Model type:** [More Information Needed]
34
  - **Language(s) (NLP):** [More Information Needed]
35
  - **License:** [More Information Needed]
36
+ - **Finetuned from model [optional]:** GaborMadarasz/ModernBERT-base-hungarian
37
 
38
  ### Model Sources [optional]
39
 
 
87
 
88
  ### Training Data
89
 
90
+ #### Phase-1 finetune
91
+
92
+ UD Hungarian Szeged:
93
+ https://universaldependencies.org/treebanks/hu_szeged/index.html
94
+
95
+ POS tagging performed with huSapcy (hu_core_news_lg) on Hungarian Wikipedia.
96
+
97
+ #### Phase-2 finetune
98
+
99
+ POS tagging performed with phase-1 fine-tuned ModernBERT on a subset of opensubtitles.
100
+
101
+ #### Phase-3 finetune
102
+
103
+ POS tagging long texts (6k-8k tokens) with stanza
104
 
 
105
 
106
  ### Training Procedure
107
 
 
142
 
143
  #### Metrics
144
 
145
+ Accuracy:
146
 
 
147
 
148
  ### Results
149
 
 
217
 
218
  ## Model Card Contact
219
 
220
+ gabor.madarasz@gmail.com