DeBERTa POS Tagger
DeBERTa (Decoding-enhanced BERT with Disentangled Attention) is a language model introduced by Microsoft that refines how words and their relationships are represented and processed, leading to often better performance in tasks like text classification and question answering. It builds on the original BERT architecture by separating the concepts of “content” (the meaning of a word) and “position” (where it appears in a sentence) during self-attention, reducing confusion about word context and improving the model’s ability to understand nuances.
Part-of-speech (POS) tagging is the process of automatically assigning each word in a sentence a specific grammatical role—such as noun, verb, or adjective—so that a machine can better interpret how words fit together. It’s one of the earliest steps in many natural language processing pipelines because knowing whether “run” is used as a noun (“a morning run”) or a verb (“to run quickly”) can make downstream tasks like parsing or text classification more accurate.
Training data: Universal Dependencies en_ewt
A Gold Standard Universal Dependencies Corpus for English, built over the source material of the English Web Treebank LDC2012T13 (https://catalog.ldc.upenn.edu/LDC2012T13).
See: universaldependencies.org
How to use this model
Checkout the example pipeline in pos_pipeline.py of this repo. The example code below shows how such a pipeline could be used.
from transformers import pipeline
from transformers.pipelines import PIPELINE_REGISTRY
from pos_pipeline import PosTaggingPipeline
task_name = "deberta-pos-ud_en_ewt"
PIPELINE_REGISTRY.register_pipeline(
task=task_name,
pipeline_class=PosTaggingPipeline
)
tagging_pipeline = pipeline(task=task_name, model=f"veryfansome/{task_name}")
predictions = tagging_pipeline("This is a sentence!")
print(predictions)
This would print [('This', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('sentence', 'NN'), ('!', '.')]. If you clone this repo, you can also use that script directly.
$ python -m pos_pipeline --text "This is another sentence!"
Device set to use mps:0
Input: This is another sentence.
Predictions: [('This', 'DT'), ('is', 'VBZ'), ('another', 'DT'), ('sentence', 'NN'), ('!', '.')]
Tune this model from scratch
Checkout pos_tuner_ud.py of this repo. The Training log section below includes the actual usage of this script that created this version of the model. Feel free to adjust the mostly generic hyperparameters in the tuning script and submit PRs if you are able to achieve better results.
Test set performance
You should be able to replicate the results below by running:
$ python -m pos_tuner_ud -d en_ewt --test --auto-tokenizer --save-path .
precision recall f1-score support
$ 0.97 1.00 0.98 30
'' 0.99 0.99 0.99 88
, 0.97 0.99 0.98 981
-LRB- 1.00 1.00 1.00 114
-RRB- 1.00 1.00 1.00 114
. 1.00 1.00 1.00 1492
: 0.95 0.89 0.92 100
ADD 1.00 0.99 1.00 1648
AFX 0.67 0.33 0.44 6
CC 1.00 0.99 0.99 742
CD 0.92 0.99 0.96 1144
DT 1.00 1.00 1.00 1952
EX 1.00 0.98 0.99 48
FW 0.86 0.25 0.39 24
GW 0.75 0.80 0.77 30
HYPH 0.98 0.97 0.97 98
IN 0.99 0.99 0.99 2325
JJ 0.95 0.95 0.95 1717
JJR 0.90 1.00 0.95 47
JJS 0.90 0.97 0.93 88
LS 1.00 1.00 1.00 3
MD 1.00 1.00 1.00 423
NFP 0.97 0.98 0.98 130
NN 0.93 0.93 0.93 3898
NNP 0.94 0.92 0.93 3002
NNPS 0.76 0.83 0.79 117
NNS 0.96 0.96 0.96 1102
PDT 0.95 1.00 0.97 19
POS 0.99 1.00 1.00 129
PRP 1.00 1.00 1.00 1426
PRP$ 0.99 0.99 0.99 335
RB 0.97 0.97 0.97 1407
RBR 0.85 0.88 0.87 26
RBS 0.82 0.70 0.76 20
RP 0.88 0.90 0.89 92
SYM 0.78 0.67 0.72 21
TO 1.00 1.00 1.00 375
UH 0.94 0.90 0.92 135
VB 0.97 0.99 0.98 1151
VBD 1.00 0.99 0.99 551
VBG 0.98 0.97 0.97 370
VBN 0.96 0.98 0.97 486
VBP 0.99 0.98 0.98 801
VBZ 1.00 1.00 1.00 638
WDT 0.97 0.99 0.98 111
WP 0.99 0.97 0.98 94
WRB 0.99 1.00 0.99 90
XX 0.00 0.00 0.00 1
`` 0.99 0.99 0.99 89
accuracy 0.97 29830
macro avg 0.93 0.91 0.91 29830
weighted avg 0.97 0.97 0.97 29830
Training logs
The output below was captured from the training of this model.
$ python -m pos_tuner_ud --from-base microsoft/deberta-base -d en_ewt --train
2025-02-07 02:04:47,167 - __main__ - INFO - Total test dataset size: 2077
2025-02-07 02:04:47,167 - __main__ - INFO - Total train dataset size: 12543
2025-02-07 02:04:47,167 - __main__ - INFO - Total validation dataset size: 2002
2025-02-07 02:04:47,238 - __main__ - INFO - Unique tag count: 50
2025-02-07 02:04:47,238 - __main__ - INFO - Unique tags: ['$', "''", ',', '-LRB-', '-RRB-', '.', ':', 'ADD', 'AFX', 'CC', 'CD', 'DT', 'EX', 'FW', 'GW', 'HYPH', 'IN', 'JJ', 'JJR', 'JJS', 'LS', 'MD', 'NFP', 'NN', 'NNP', 'NNPS', 'NNS', 'PDT', 'POS', 'PRP', 'PRP$', 'RB', 'RBR', 'RBS', 'RP', 'SYM', 'TO', 'UH', 'VB', 'VBD', 'VBG', 'VBN', 'VBP', 'VBZ', 'WDT', 'WP', 'WP$', 'WRB', 'XX', '``']
Some weights of DebertaForTokenClassification were not initialized from the model checkpoint at microsoft/deberta-base and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
{'loss': 1.9114, 'grad_norm': 9.790359497070312, 'learning_rate': 4.9468537414965985e-05, 'epoch': 0.03}
{'loss': 0.4223, 'grad_norm': 3.621584415435791, 'learning_rate': 4.8937074829931974e-05, 'epoch': 0.06}
{'loss': 0.2691, 'grad_norm': 3.539963960647583, 'learning_rate': 4.840561224489796e-05, 'epoch': 0.1}
{'loss': 0.214, 'grad_norm': 7.832743167877197, 'learning_rate': 4.7874149659863945e-05, 'epoch': 0.13}
{'loss': 0.1841, 'grad_norm': 3.4455857276916504, 'learning_rate': 4.7342687074829934e-05, 'epoch': 0.16}
{'loss': 0.2053, 'grad_norm': 4.7766523361206055, 'learning_rate': 4.6811224489795916e-05, 'epoch': 0.19}
{'loss': 0.1588, 'grad_norm': 0.8650943636894226, 'learning_rate': 4.627976190476191e-05, 'epoch': 0.22}
{'loss': 0.1992, 'grad_norm': 1.5485981702804565, 'learning_rate': 4.5748299319727895e-05, 'epoch': 0.26}
{'loss': 0.1661, 'grad_norm': 2.333674192428589, 'learning_rate': 4.521683673469388e-05, 'epoch': 0.29}
{'loss': 0.1518, 'grad_norm': 1.3744149208068848, 'learning_rate': 4.4685374149659866e-05, 'epoch': 0.32}
{'loss': 0.1411, 'grad_norm': 2.857898473739624, 'learning_rate': 4.4153911564625855e-05, 'epoch': 0.35}
{'loss': 0.1445, 'grad_norm': 2.07376766204834, 'learning_rate': 4.362244897959184e-05, 'epoch': 0.38}
{'loss': 0.1843, 'grad_norm': 2.962191343307495, 'learning_rate': 4.3090986394557826e-05, 'epoch': 0.41}
{'loss': 0.1255, 'grad_norm': 2.3308300971984863, 'learning_rate': 4.255952380952381e-05, 'epoch': 0.45}
{'loss': 0.1834, 'grad_norm': 3.2745347023010254, 'learning_rate': 4.20280612244898e-05, 'epoch': 0.48}
{'loss': 0.1328, 'grad_norm': 16.241605758666992, 'learning_rate': 4.149659863945579e-05, 'epoch': 0.51}
{'loss': 0.128, 'grad_norm': 4.001339912414551, 'learning_rate': 4.096513605442177e-05, 'epoch': 0.54}
{'loss': 0.1373, 'grad_norm': 4.761608123779297, 'learning_rate': 4.043367346938776e-05, 'epoch': 0.57}
{'loss': 0.1069, 'grad_norm': 2.788573980331421, 'learning_rate': 3.990221088435374e-05, 'epoch': 0.61}
{'loss': 0.1517, 'grad_norm': 3.5314674377441406, 'learning_rate': 3.937074829931973e-05, 'epoch': 0.64}
{'loss': 0.1223, 'grad_norm': 1.3783732652664185, 'learning_rate': 3.883928571428572e-05, 'epoch': 0.67}
{'loss': 0.111, 'grad_norm': 0.8326973915100098, 'learning_rate': 3.83078231292517e-05, 'epoch': 0.7}
{'loss': 0.111, 'grad_norm': 2.8222618103027344, 'learning_rate': 3.777636054421769e-05, 'epoch': 0.73}
{'loss': 0.1152, 'grad_norm': 0.7433456778526306, 'learning_rate': 3.724489795918368e-05, 'epoch': 0.77}
{'loss': 0.1262, 'grad_norm': 5.192315578460693, 'learning_rate': 3.671343537414966e-05, 'epoch': 0.8}
{'loss': 0.0999, 'grad_norm': 0.4399052560329437, 'learning_rate': 3.618197278911565e-05, 'epoch': 0.83}
{'loss': 0.1196, 'grad_norm': 8.397335052490234, 'learning_rate': 3.565051020408163e-05, 'epoch': 0.86}
{'loss': 0.1257, 'grad_norm': 2.4273781776428223, 'learning_rate': 3.511904761904762e-05, 'epoch': 0.89}
{'loss': 0.1194, 'grad_norm': 2.507136821746826, 'learning_rate': 3.458758503401361e-05, 'epoch': 0.92}
{'loss': 0.1016, 'grad_norm': 1.5573835372924805, 'learning_rate': 3.405612244897959e-05, 'epoch': 0.96}
{'loss': 0.1266, 'grad_norm': 9.165238380432129, 'learning_rate': 3.352465986394558e-05, 'epoch': 0.99}
{'eval_loss': 0.16062971949577332, 'eval_accuracy': 0.9588899600605977, 'eval_precision_macro': 0.8814931317055487, 'eval_recall_macro': 0.8538632045515081, 'eval_f1_macro': 0.8560040914191638, 'eval_precision_micro': 0.9588899600605977, 'eval_recall_micro': 0.9588899600605977, 'eval_f1_micro': 0.9588899600605977, 'eval_runtime': 96.6431, 'eval_samples_per_second': 20.715, 'eval_steps_per_second': 2.597, 'epoch': 1.0}
{'loss': 0.0848, 'grad_norm': 0.9852877259254456, 'learning_rate': 3.2993197278911564e-05, 'epoch': 1.02}
{'loss': 0.0487, 'grad_norm': 0.03514533489942551, 'learning_rate': 3.246173469387755e-05, 'epoch': 1.05}
{'loss': 0.0667, 'grad_norm': 3.4047012329101562, 'learning_rate': 3.193027210884354e-05, 'epoch': 1.08}
{'loss': 0.0832, 'grad_norm': 1.5175713300704956, 'learning_rate': 3.1398809523809525e-05, 'epoch': 1.12}
{'loss': 0.0701, 'grad_norm': 1.749105453491211, 'learning_rate': 3.086734693877551e-05, 'epoch': 1.15}
{'loss': 0.0799, 'grad_norm': 2.480213165283203, 'learning_rate': 3.03358843537415e-05, 'epoch': 1.18}
{'loss': 0.0505, 'grad_norm': 5.6889262199401855, 'learning_rate': 2.9804421768707485e-05, 'epoch': 1.21}
{'loss': 0.0568, 'grad_norm': 1.6858110427856445, 'learning_rate': 2.927295918367347e-05, 'epoch': 1.24}
{'loss': 0.0676, 'grad_norm': 0.252985417842865, 'learning_rate': 2.8741496598639456e-05, 'epoch': 1.28}
{'loss': 0.0622, 'grad_norm': 1.2210630178451538, 'learning_rate': 2.8210034013605442e-05, 'epoch': 1.31}
{'loss': 0.0757, 'grad_norm': 1.12152898311615, 'learning_rate': 2.767857142857143e-05, 'epoch': 1.34}
{'loss': 0.0703, 'grad_norm': 2.617388963699341, 'learning_rate': 2.7147108843537417e-05, 'epoch': 1.37}
{'loss': 0.0648, 'grad_norm': 1.4879817962646484, 'learning_rate': 2.6615646258503402e-05, 'epoch': 1.4}
{'loss': 0.0497, 'grad_norm': 4.442806243896484, 'learning_rate': 2.6084183673469388e-05, 'epoch': 1.43}
{'loss': 0.0547, 'grad_norm': 3.7966811656951904, 'learning_rate': 2.5552721088435377e-05, 'epoch': 1.47}
{'loss': 0.0473, 'grad_norm': 0.41319936513900757, 'learning_rate': 2.5021258503401363e-05, 'epoch': 1.5}
{'loss': 0.0751, 'grad_norm': 2.460292100906372, 'learning_rate': 2.448979591836735e-05, 'epoch': 1.53}
{'loss': 0.0507, 'grad_norm': 1.1354448795318604, 'learning_rate': 2.3958333333333334e-05, 'epoch': 1.56}
{'loss': 0.0584, 'grad_norm': 0.8857676982879639, 'learning_rate': 2.342687074829932e-05, 'epoch': 1.59}
{'loss': 0.0532, 'grad_norm': 7.948622703552246, 'learning_rate': 2.289540816326531e-05, 'epoch': 1.63}
{'loss': 0.0468, 'grad_norm': 0.389117568731308, 'learning_rate': 2.2363945578231294e-05, 'epoch': 1.66}
{'loss': 0.0533, 'grad_norm': 1.2459965944290161, 'learning_rate': 2.183248299319728e-05, 'epoch': 1.69}
{'loss': 0.0548, 'grad_norm': 0.6528679132461548, 'learning_rate': 2.1301020408163266e-05, 'epoch': 1.72}
{'loss': 0.0553, 'grad_norm': 3.881239891052246, 'learning_rate': 2.076955782312925e-05, 'epoch': 1.75}
{'loss': 0.0535, 'grad_norm': 7.839347839355469, 'learning_rate': 2.023809523809524e-05, 'epoch': 1.79}
{'loss': 0.0496, 'grad_norm': 1.0373271703720093, 'learning_rate': 1.9706632653061223e-05, 'epoch': 1.82}
{'loss': 0.0736, 'grad_norm': 1.6896023750305176, 'learning_rate': 1.9175170068027212e-05, 'epoch': 1.85}
{'loss': 0.0465, 'grad_norm': 0.8177493810653687, 'learning_rate': 1.8643707482993198e-05, 'epoch': 1.88}
{'loss': 0.0579, 'grad_norm': 0.22082467377185822, 'learning_rate': 1.8112244897959187e-05, 'epoch': 1.91}
{'loss': 0.0519, 'grad_norm': 2.5242559909820557, 'learning_rate': 1.758078231292517e-05, 'epoch': 1.95}
{'loss': 0.0666, 'grad_norm': 1.6839925050735474, 'learning_rate': 1.7049319727891158e-05, 'epoch': 1.98}
{'eval_loss': 0.13748927414417267, 'eval_accuracy': 0.9669122710370472, 'eval_precision_macro': 0.884902662234198, 'eval_recall_macro': 0.8699883737304458, 'eval_f1_macro': 0.8708056238887555, 'eval_precision_micro': 0.9669122710370472, 'eval_recall_micro': 0.9669122710370472, 'eval_f1_micro': 0.9669122710370472, 'eval_runtime': 95.3131, 'eval_samples_per_second': 21.004, 'eval_steps_per_second': 2.633, 'epoch': 2.0}
{'loss': 0.0455, 'grad_norm': 0.8205501437187195, 'learning_rate': 1.6517857142857144e-05, 'epoch': 2.01}
{'loss': 0.033, 'grad_norm': 1.2112853527069092, 'learning_rate': 1.5986394557823133e-05, 'epoch': 2.04}
{'loss': 0.0236, 'grad_norm': 0.7625674605369568, 'learning_rate': 1.5454931972789115e-05, 'epoch': 2.07}
{'loss': 0.0257, 'grad_norm': 3.307070732116699, 'learning_rate': 1.4923469387755104e-05, 'epoch': 2.1}
{'loss': 0.0223, 'grad_norm': 1.4492987394332886, 'learning_rate': 1.439200680272109e-05, 'epoch': 2.14}
{'loss': 0.0212, 'grad_norm': 1.9123435020446777, 'learning_rate': 1.3860544217687074e-05, 'epoch': 2.17}
{'loss': 0.0268, 'grad_norm': 1.0063904523849487, 'learning_rate': 1.3329081632653063e-05, 'epoch': 2.2}
{'loss': 0.0234, 'grad_norm': 1.64284348487854, 'learning_rate': 1.2797619047619047e-05, 'epoch': 2.23}
{'loss': 0.0182, 'grad_norm': 1.5391156673431396, 'learning_rate': 1.2266156462585036e-05, 'epoch': 2.26}
{'loss': 0.0287, 'grad_norm': 0.45055273175239563, 'learning_rate': 1.1734693877551021e-05, 'epoch': 2.3}
{'loss': 0.0278, 'grad_norm': 0.04466330260038376, 'learning_rate': 1.1203231292517009e-05, 'epoch': 2.33}
{'loss': 0.0232, 'grad_norm': 1.6453211307525635, 'learning_rate': 1.0671768707482993e-05, 'epoch': 2.36}
{'loss': 0.0146, 'grad_norm': 3.7085514068603516, 'learning_rate': 1.014030612244898e-05, 'epoch': 2.39}
{'loss': 0.0192, 'grad_norm': 5.159766674041748, 'learning_rate': 9.608843537414966e-06, 'epoch': 2.42}
{'loss': 0.025, 'grad_norm': 0.4918728768825531, 'learning_rate': 9.077380952380953e-06, 'epoch': 2.46}
{'loss': 0.0194, 'grad_norm': 0.6267568469047546, 'learning_rate': 8.545918367346939e-06, 'epoch': 2.49}
{'loss': 0.032, 'grad_norm': 0.488130658864975, 'learning_rate': 8.014455782312926e-06, 'epoch': 2.52}
{'loss': 0.0212, 'grad_norm': 0.14098979532718658, 'learning_rate': 7.482993197278912e-06, 'epoch': 2.55}
{'loss': 0.0315, 'grad_norm': 0.855661928653717, 'learning_rate': 6.951530612244898e-06, 'epoch': 2.58}
{'loss': 0.0356, 'grad_norm': 1.9364949464797974, 'learning_rate': 6.420068027210885e-06, 'epoch': 2.61}
{'loss': 0.0196, 'grad_norm': 0.6178730130195618, 'learning_rate': 5.888605442176871e-06, 'epoch': 2.65}
{'loss': 0.0233, 'grad_norm': 1.0739645957946777, 'learning_rate': 5.357142857142857e-06, 'epoch': 2.68}
{'loss': 0.0286, 'grad_norm': 1.3899648189544678, 'learning_rate': 4.8256802721088436e-06, 'epoch': 2.71}
{'loss': 0.0239, 'grad_norm': 0.039138078689575195, 'learning_rate': 4.29421768707483e-06, 'epoch': 2.74}
{'loss': 0.0246, 'grad_norm': 2.08718204498291, 'learning_rate': 3.7627551020408166e-06, 'epoch': 2.77}
{'loss': 0.0255, 'grad_norm': 0.0457083024084568, 'learning_rate': 3.231292517006803e-06, 'epoch': 2.81}
{'loss': 0.0235, 'grad_norm': 0.09946563839912415, 'learning_rate': 2.699829931972789e-06, 'epoch': 2.84}
{'loss': 0.0252, 'grad_norm': 0.6097291707992554, 'learning_rate': 2.1683673469387757e-06, 'epoch': 2.87}
{'loss': 0.0233, 'grad_norm': 1.0756237506866455, 'learning_rate': 1.636904761904762e-06, 'epoch': 2.9}
{'loss': 0.0207, 'grad_norm': 1.6282234191894531, 'learning_rate': 1.1054421768707483e-06, 'epoch': 2.93}
{'loss': 0.0252, 'grad_norm': 0.4588093161582947, 'learning_rate': 5.739795918367347e-07, 'epoch': 2.97}
{'loss': 0.022, 'grad_norm': 1.7556936740875244, 'learning_rate': 4.251700680272109e-08, 'epoch': 3.0}
{'eval_loss': 0.14641138911247253, 'eval_accuracy': 0.9706651976311803, 'eval_precision_macro': 0.9150958814124206, 'eval_recall_macro': 0.8805496995945282, 'eval_f1_macro': 0.8876318705564819, 'eval_precision_micro': 0.9706651976311803, 'eval_recall_micro': 0.9706651976311803, 'eval_f1_micro': 0.9706651976311803, 'eval_runtime': 95.4759, 'eval_samples_per_second': 20.969, 'eval_steps_per_second': 2.629, 'epoch': 3.0}
{'train_runtime': 5495.6482, 'train_samples_per_second': 6.847, 'train_steps_per_second': 0.856, 'train_loss': 0.09864862742722921, 'epoch': 3.0}
- Downloads last month
- -