You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

DeBERTa POS Tagger

DeBERTa (Decoding-enhanced BERT with Disentangled Attention) is a language model introduced by Microsoft that refines how words and their relationships are represented and processed, leading to often better performance in tasks like text classification and question answering. It builds on the original BERT architecture by separating the concepts of “content” (the meaning of a word) and “position” (where it appears in a sentence) during self-attention, reducing confusion about word context and improving the model’s ability to understand nuances.

Part-of-speech (POS) tagging is the process of automatically assigning each word in a sentence a specific grammatical role—such as noun, verb, or adjective—so that a machine can better interpret how words fit together. It’s one of the earliest steps in many natural language processing pipelines because knowing whether “run” is used as a noun (“a morning run”) or a verb (“to run quickly”) can make downstream tasks like parsing or text classification more accurate.

Training data: Universal Dependencies en_ewt

A Gold Standard Universal Dependencies Corpus for English, built over the source material of the English Web Treebank LDC2012T13 (https://catalog.ldc.upenn.edu/LDC2012T13).

See: universaldependencies.org

How to use this model

Checkout the example pipeline in pos_pipeline.py of this repo. The example code below shows how such a pipeline could be used.

from transformers import pipeline
from transformers.pipelines import PIPELINE_REGISTRY
from pos_pipeline import PosTaggingPipeline

task_name = "deberta-pos-ud_en_ewt"
PIPELINE_REGISTRY.register_pipeline(
    task=task_name,
    pipeline_class=PosTaggingPipeline
)
tagging_pipeline = pipeline(task=task_name, model=f"veryfansome/{task_name}")
predictions = tagging_pipeline("This is a sentence!")
print(predictions)

This would print [('This', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('sentence', 'NN'), ('!', '.')]. If you clone this repo, you can also use that script directly.

$ python -m pos_pipeline --text "This is another sentence!"
Device set to use mps:0
Input: This is another sentence.
Predictions: [('This', 'DT'), ('is', 'VBZ'), ('another', 'DT'), ('sentence', 'NN'), ('!', '.')]

Tune this model from scratch

Checkout pos_tuner_ud.py of this repo. The Training log section below includes the actual usage of this script that created this version of the model. Feel free to adjust the mostly generic hyperparameters in the tuning script and submit PRs if you are able to achieve better results.

Test set performance

You should be able to replicate the results below by running:

$ python -m pos_tuner_ud -d en_ewt --test --auto-tokenizer --save-path .
              precision    recall  f1-score   support

           $       0.97      1.00      0.98        30
          ''       0.99      0.99      0.99        88
           ,       0.97      0.99      0.98       981
       -LRB-       1.00      1.00      1.00       114
       -RRB-       1.00      1.00      1.00       114
           .       1.00      1.00      1.00      1492
           :       0.95      0.89      0.92       100
         ADD       1.00      0.99      1.00      1648
         AFX       0.67      0.33      0.44         6
          CC       1.00      0.99      0.99       742
          CD       0.92      0.99      0.96      1144
          DT       1.00      1.00      1.00      1952
          EX       1.00      0.98      0.99        48
          FW       0.86      0.25      0.39        24
          GW       0.75      0.80      0.77        30
        HYPH       0.98      0.97      0.97        98
          IN       0.99      0.99      0.99      2325
          JJ       0.95      0.95      0.95      1717
         JJR       0.90      1.00      0.95        47
         JJS       0.90      0.97      0.93        88
          LS       1.00      1.00      1.00         3
          MD       1.00      1.00      1.00       423
         NFP       0.97      0.98      0.98       130
          NN       0.93      0.93      0.93      3898
         NNP       0.94      0.92      0.93      3002
        NNPS       0.76      0.83      0.79       117
         NNS       0.96      0.96      0.96      1102
         PDT       0.95      1.00      0.97        19
         POS       0.99      1.00      1.00       129
         PRP       1.00      1.00      1.00      1426
        PRP$       0.99      0.99      0.99       335
          RB       0.97      0.97      0.97      1407
         RBR       0.85      0.88      0.87        26
         RBS       0.82      0.70      0.76        20
          RP       0.88      0.90      0.89        92
         SYM       0.78      0.67      0.72        21
          TO       1.00      1.00      1.00       375
          UH       0.94      0.90      0.92       135
          VB       0.97      0.99      0.98      1151
         VBD       1.00      0.99      0.99       551
         VBG       0.98      0.97      0.97       370
         VBN       0.96      0.98      0.97       486
         VBP       0.99      0.98      0.98       801
         VBZ       1.00      1.00      1.00       638
         WDT       0.97      0.99      0.98       111
          WP       0.99      0.97      0.98        94
         WRB       0.99      1.00      0.99        90
          XX       0.00      0.00      0.00         1
          ``       0.99      0.99      0.99        89

    accuracy                           0.97     29830
   macro avg       0.93      0.91      0.91     29830
weighted avg       0.97      0.97      0.97     29830

Training logs

The output below was captured from the training of this model.

$ python -m pos_tuner_ud --from-base microsoft/deberta-base -d en_ewt --train
2025-02-07 02:04:47,167 - __main__ - INFO - Total test dataset size: 2077
2025-02-07 02:04:47,167 - __main__ - INFO - Total train dataset size: 12543
2025-02-07 02:04:47,167 - __main__ - INFO - Total validation dataset size: 2002
2025-02-07 02:04:47,238 - __main__ - INFO - Unique tag count: 50
2025-02-07 02:04:47,238 - __main__ - INFO - Unique tags: ['$', "''", ',', '-LRB-', '-RRB-', '.', ':', 'ADD', 'AFX', 'CC', 'CD', 'DT', 'EX', 'FW', 'GW', 'HYPH', 'IN', 'JJ', 'JJR', 'JJS', 'LS', 'MD', 'NFP', 'NN', 'NNP', 'NNPS', 'NNS', 'PDT', 'POS', 'PRP', 'PRP$', 'RB', 'RBR', 'RBS', 'RP', 'SYM', 'TO', 'UH', 'VB', 'VBD', 'VBG', 'VBN', 'VBP', 'VBZ', 'WDT', 'WP', 'WP$', 'WRB', 'XX', '``']
Some weights of DebertaForTokenClassification were not initialized from the model checkpoint at microsoft/deberta-base and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
{'loss': 1.9114, 'grad_norm': 9.790359497070312, 'learning_rate': 4.9468537414965985e-05, 'epoch': 0.03}
{'loss': 0.4223, 'grad_norm': 3.621584415435791, 'learning_rate': 4.8937074829931974e-05, 'epoch': 0.06}
{'loss': 0.2691, 'grad_norm': 3.539963960647583, 'learning_rate': 4.840561224489796e-05, 'epoch': 0.1}
{'loss': 0.214, 'grad_norm': 7.832743167877197, 'learning_rate': 4.7874149659863945e-05, 'epoch': 0.13}
{'loss': 0.1841, 'grad_norm': 3.4455857276916504, 'learning_rate': 4.7342687074829934e-05, 'epoch': 0.16}
{'loss': 0.2053, 'grad_norm': 4.7766523361206055, 'learning_rate': 4.6811224489795916e-05, 'epoch': 0.19}
{'loss': 0.1588, 'grad_norm': 0.8650943636894226, 'learning_rate': 4.627976190476191e-05, 'epoch': 0.22}
{'loss': 0.1992, 'grad_norm': 1.5485981702804565, 'learning_rate': 4.5748299319727895e-05, 'epoch': 0.26}
{'loss': 0.1661, 'grad_norm': 2.333674192428589, 'learning_rate': 4.521683673469388e-05, 'epoch': 0.29}
{'loss': 0.1518, 'grad_norm': 1.3744149208068848, 'learning_rate': 4.4685374149659866e-05, 'epoch': 0.32}
{'loss': 0.1411, 'grad_norm': 2.857898473739624, 'learning_rate': 4.4153911564625855e-05, 'epoch': 0.35}
{'loss': 0.1445, 'grad_norm': 2.07376766204834, 'learning_rate': 4.362244897959184e-05, 'epoch': 0.38}
{'loss': 0.1843, 'grad_norm': 2.962191343307495, 'learning_rate': 4.3090986394557826e-05, 'epoch': 0.41}
{'loss': 0.1255, 'grad_norm': 2.3308300971984863, 'learning_rate': 4.255952380952381e-05, 'epoch': 0.45}
{'loss': 0.1834, 'grad_norm': 3.2745347023010254, 'learning_rate': 4.20280612244898e-05, 'epoch': 0.48}
{'loss': 0.1328, 'grad_norm': 16.241605758666992, 'learning_rate': 4.149659863945579e-05, 'epoch': 0.51}
{'loss': 0.128, 'grad_norm': 4.001339912414551, 'learning_rate': 4.096513605442177e-05, 'epoch': 0.54}
{'loss': 0.1373, 'grad_norm': 4.761608123779297, 'learning_rate': 4.043367346938776e-05, 'epoch': 0.57}
{'loss': 0.1069, 'grad_norm': 2.788573980331421, 'learning_rate': 3.990221088435374e-05, 'epoch': 0.61}
{'loss': 0.1517, 'grad_norm': 3.5314674377441406, 'learning_rate': 3.937074829931973e-05, 'epoch': 0.64}
{'loss': 0.1223, 'grad_norm': 1.3783732652664185, 'learning_rate': 3.883928571428572e-05, 'epoch': 0.67}
{'loss': 0.111, 'grad_norm': 0.8326973915100098, 'learning_rate': 3.83078231292517e-05, 'epoch': 0.7}
{'loss': 0.111, 'grad_norm': 2.8222618103027344, 'learning_rate': 3.777636054421769e-05, 'epoch': 0.73}
{'loss': 0.1152, 'grad_norm': 0.7433456778526306, 'learning_rate': 3.724489795918368e-05, 'epoch': 0.77}
{'loss': 0.1262, 'grad_norm': 5.192315578460693, 'learning_rate': 3.671343537414966e-05, 'epoch': 0.8}
{'loss': 0.0999, 'grad_norm': 0.4399052560329437, 'learning_rate': 3.618197278911565e-05, 'epoch': 0.83}
{'loss': 0.1196, 'grad_norm': 8.397335052490234, 'learning_rate': 3.565051020408163e-05, 'epoch': 0.86}
{'loss': 0.1257, 'grad_norm': 2.4273781776428223, 'learning_rate': 3.511904761904762e-05, 'epoch': 0.89}
{'loss': 0.1194, 'grad_norm': 2.507136821746826, 'learning_rate': 3.458758503401361e-05, 'epoch': 0.92}
{'loss': 0.1016, 'grad_norm': 1.5573835372924805, 'learning_rate': 3.405612244897959e-05, 'epoch': 0.96}
{'loss': 0.1266, 'grad_norm': 9.165238380432129, 'learning_rate': 3.352465986394558e-05, 'epoch': 0.99}
{'eval_loss': 0.16062971949577332, 'eval_accuracy': 0.9588899600605977, 'eval_precision_macro': 0.8814931317055487, 'eval_recall_macro': 0.8538632045515081, 'eval_f1_macro': 0.8560040914191638, 'eval_precision_micro': 0.9588899600605977, 'eval_recall_micro': 0.9588899600605977, 'eval_f1_micro': 0.9588899600605977, 'eval_runtime': 96.6431, 'eval_samples_per_second': 20.715, 'eval_steps_per_second': 2.597, 'epoch': 1.0}
{'loss': 0.0848, 'grad_norm': 0.9852877259254456, 'learning_rate': 3.2993197278911564e-05, 'epoch': 1.02}
{'loss': 0.0487, 'grad_norm': 0.03514533489942551, 'learning_rate': 3.246173469387755e-05, 'epoch': 1.05}
{'loss': 0.0667, 'grad_norm': 3.4047012329101562, 'learning_rate': 3.193027210884354e-05, 'epoch': 1.08}
{'loss': 0.0832, 'grad_norm': 1.5175713300704956, 'learning_rate': 3.1398809523809525e-05, 'epoch': 1.12}
{'loss': 0.0701, 'grad_norm': 1.749105453491211, 'learning_rate': 3.086734693877551e-05, 'epoch': 1.15}
{'loss': 0.0799, 'grad_norm': 2.480213165283203, 'learning_rate': 3.03358843537415e-05, 'epoch': 1.18}
{'loss': 0.0505, 'grad_norm': 5.6889262199401855, 'learning_rate': 2.9804421768707485e-05, 'epoch': 1.21}
{'loss': 0.0568, 'grad_norm': 1.6858110427856445, 'learning_rate': 2.927295918367347e-05, 'epoch': 1.24}
{'loss': 0.0676, 'grad_norm': 0.252985417842865, 'learning_rate': 2.8741496598639456e-05, 'epoch': 1.28}
{'loss': 0.0622, 'grad_norm': 1.2210630178451538, 'learning_rate': 2.8210034013605442e-05, 'epoch': 1.31}
{'loss': 0.0757, 'grad_norm': 1.12152898311615, 'learning_rate': 2.767857142857143e-05, 'epoch': 1.34}
{'loss': 0.0703, 'grad_norm': 2.617388963699341, 'learning_rate': 2.7147108843537417e-05, 'epoch': 1.37}
{'loss': 0.0648, 'grad_norm': 1.4879817962646484, 'learning_rate': 2.6615646258503402e-05, 'epoch': 1.4}
{'loss': 0.0497, 'grad_norm': 4.442806243896484, 'learning_rate': 2.6084183673469388e-05, 'epoch': 1.43}
{'loss': 0.0547, 'grad_norm': 3.7966811656951904, 'learning_rate': 2.5552721088435377e-05, 'epoch': 1.47}
{'loss': 0.0473, 'grad_norm': 0.41319936513900757, 'learning_rate': 2.5021258503401363e-05, 'epoch': 1.5}
{'loss': 0.0751, 'grad_norm': 2.460292100906372, 'learning_rate': 2.448979591836735e-05, 'epoch': 1.53}
{'loss': 0.0507, 'grad_norm': 1.1354448795318604, 'learning_rate': 2.3958333333333334e-05, 'epoch': 1.56}
{'loss': 0.0584, 'grad_norm': 0.8857676982879639, 'learning_rate': 2.342687074829932e-05, 'epoch': 1.59}
{'loss': 0.0532, 'grad_norm': 7.948622703552246, 'learning_rate': 2.289540816326531e-05, 'epoch': 1.63}
{'loss': 0.0468, 'grad_norm': 0.389117568731308, 'learning_rate': 2.2363945578231294e-05, 'epoch': 1.66}
{'loss': 0.0533, 'grad_norm': 1.2459965944290161, 'learning_rate': 2.183248299319728e-05, 'epoch': 1.69}
{'loss': 0.0548, 'grad_norm': 0.6528679132461548, 'learning_rate': 2.1301020408163266e-05, 'epoch': 1.72}
{'loss': 0.0553, 'grad_norm': 3.881239891052246, 'learning_rate': 2.076955782312925e-05, 'epoch': 1.75}
{'loss': 0.0535, 'grad_norm': 7.839347839355469, 'learning_rate': 2.023809523809524e-05, 'epoch': 1.79}
{'loss': 0.0496, 'grad_norm': 1.0373271703720093, 'learning_rate': 1.9706632653061223e-05, 'epoch': 1.82}
{'loss': 0.0736, 'grad_norm': 1.6896023750305176, 'learning_rate': 1.9175170068027212e-05, 'epoch': 1.85}
{'loss': 0.0465, 'grad_norm': 0.8177493810653687, 'learning_rate': 1.8643707482993198e-05, 'epoch': 1.88}
{'loss': 0.0579, 'grad_norm': 0.22082467377185822, 'learning_rate': 1.8112244897959187e-05, 'epoch': 1.91}
{'loss': 0.0519, 'grad_norm': 2.5242559909820557, 'learning_rate': 1.758078231292517e-05, 'epoch': 1.95}
{'loss': 0.0666, 'grad_norm': 1.6839925050735474, 'learning_rate': 1.7049319727891158e-05, 'epoch': 1.98}
{'eval_loss': 0.13748927414417267, 'eval_accuracy': 0.9669122710370472, 'eval_precision_macro': 0.884902662234198, 'eval_recall_macro': 0.8699883737304458, 'eval_f1_macro': 0.8708056238887555, 'eval_precision_micro': 0.9669122710370472, 'eval_recall_micro': 0.9669122710370472, 'eval_f1_micro': 0.9669122710370472, 'eval_runtime': 95.3131, 'eval_samples_per_second': 21.004, 'eval_steps_per_second': 2.633, 'epoch': 2.0}
{'loss': 0.0455, 'grad_norm': 0.8205501437187195, 'learning_rate': 1.6517857142857144e-05, 'epoch': 2.01}
{'loss': 0.033, 'grad_norm': 1.2112853527069092, 'learning_rate': 1.5986394557823133e-05, 'epoch': 2.04}
{'loss': 0.0236, 'grad_norm': 0.7625674605369568, 'learning_rate': 1.5454931972789115e-05, 'epoch': 2.07}
{'loss': 0.0257, 'grad_norm': 3.307070732116699, 'learning_rate': 1.4923469387755104e-05, 'epoch': 2.1}
{'loss': 0.0223, 'grad_norm': 1.4492987394332886, 'learning_rate': 1.439200680272109e-05, 'epoch': 2.14}
{'loss': 0.0212, 'grad_norm': 1.9123435020446777, 'learning_rate': 1.3860544217687074e-05, 'epoch': 2.17}
{'loss': 0.0268, 'grad_norm': 1.0063904523849487, 'learning_rate': 1.3329081632653063e-05, 'epoch': 2.2}
{'loss': 0.0234, 'grad_norm': 1.64284348487854, 'learning_rate': 1.2797619047619047e-05, 'epoch': 2.23}
{'loss': 0.0182, 'grad_norm': 1.5391156673431396, 'learning_rate': 1.2266156462585036e-05, 'epoch': 2.26}
{'loss': 0.0287, 'grad_norm': 0.45055273175239563, 'learning_rate': 1.1734693877551021e-05, 'epoch': 2.3}
{'loss': 0.0278, 'grad_norm': 0.04466330260038376, 'learning_rate': 1.1203231292517009e-05, 'epoch': 2.33}
{'loss': 0.0232, 'grad_norm': 1.6453211307525635, 'learning_rate': 1.0671768707482993e-05, 'epoch': 2.36}
{'loss': 0.0146, 'grad_norm': 3.7085514068603516, 'learning_rate': 1.014030612244898e-05, 'epoch': 2.39}
{'loss': 0.0192, 'grad_norm': 5.159766674041748, 'learning_rate': 9.608843537414966e-06, 'epoch': 2.42}
{'loss': 0.025, 'grad_norm': 0.4918728768825531, 'learning_rate': 9.077380952380953e-06, 'epoch': 2.46}
{'loss': 0.0194, 'grad_norm': 0.6267568469047546, 'learning_rate': 8.545918367346939e-06, 'epoch': 2.49}
{'loss': 0.032, 'grad_norm': 0.488130658864975, 'learning_rate': 8.014455782312926e-06, 'epoch': 2.52}
{'loss': 0.0212, 'grad_norm': 0.14098979532718658, 'learning_rate': 7.482993197278912e-06, 'epoch': 2.55}
{'loss': 0.0315, 'grad_norm': 0.855661928653717, 'learning_rate': 6.951530612244898e-06, 'epoch': 2.58}
{'loss': 0.0356, 'grad_norm': 1.9364949464797974, 'learning_rate': 6.420068027210885e-06, 'epoch': 2.61}
{'loss': 0.0196, 'grad_norm': 0.6178730130195618, 'learning_rate': 5.888605442176871e-06, 'epoch': 2.65}
{'loss': 0.0233, 'grad_norm': 1.0739645957946777, 'learning_rate': 5.357142857142857e-06, 'epoch': 2.68}
{'loss': 0.0286, 'grad_norm': 1.3899648189544678, 'learning_rate': 4.8256802721088436e-06, 'epoch': 2.71}
{'loss': 0.0239, 'grad_norm': 0.039138078689575195, 'learning_rate': 4.29421768707483e-06, 'epoch': 2.74}
{'loss': 0.0246, 'grad_norm': 2.08718204498291, 'learning_rate': 3.7627551020408166e-06, 'epoch': 2.77}
{'loss': 0.0255, 'grad_norm': 0.0457083024084568, 'learning_rate': 3.231292517006803e-06, 'epoch': 2.81}
{'loss': 0.0235, 'grad_norm': 0.09946563839912415, 'learning_rate': 2.699829931972789e-06, 'epoch': 2.84}
{'loss': 0.0252, 'grad_norm': 0.6097291707992554, 'learning_rate': 2.1683673469387757e-06, 'epoch': 2.87}
{'loss': 0.0233, 'grad_norm': 1.0756237506866455, 'learning_rate': 1.636904761904762e-06, 'epoch': 2.9}
{'loss': 0.0207, 'grad_norm': 1.6282234191894531, 'learning_rate': 1.1054421768707483e-06, 'epoch': 2.93}
{'loss': 0.0252, 'grad_norm': 0.4588093161582947, 'learning_rate': 5.739795918367347e-07, 'epoch': 2.97}
{'loss': 0.022, 'grad_norm': 1.7556936740875244, 'learning_rate': 4.251700680272109e-08, 'epoch': 3.0}
{'eval_loss': 0.14641138911247253, 'eval_accuracy': 0.9706651976311803, 'eval_precision_macro': 0.9150958814124206, 'eval_recall_macro': 0.8805496995945282, 'eval_f1_macro': 0.8876318705564819, 'eval_precision_micro': 0.9706651976311803, 'eval_recall_micro': 0.9706651976311803, 'eval_f1_micro': 0.9706651976311803, 'eval_runtime': 95.4759, 'eval_samples_per_second': 20.969, 'eval_steps_per_second': 2.629, 'epoch': 3.0}
{'train_runtime': 5495.6482, 'train_samples_per_second': 6.847, 'train_steps_per_second': 0.856, 'train_loss': 0.09864862742722921, 'epoch': 3.0}
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support