Upload folder using huggingface_hub

Browse files

Files changed (6) hide show

README.md +68 -68
config.json +200 -207
head_config.json +1 -1
model.safetensors +2 -2
modeling_havelock.py +0 -1
type_to_idx.json +11 -12

README.md CHANGED Viewed

@@ -22,7 +22,7 @@ datasets:
 BERT-based token classifier for detecting **oral and literate markers** in text, based on Walter Ong's "Orality and Literacy" (1982).
-This model performs multi-label span-level detection of 53 rhetorical marker types, where each token independently carries B/I/O labels per type — allowing overlapping spans (e.g. a token that is simultaneously part of a concessive and a nested clause).
 ## Model Details
@@ -30,15 +30,14 @@ This model performs multi-label span-level detection of 53 rhetorical marker typ
 |----------|-------|
 | Base model | `bert-base-uncased` |
 | Task | Multi-label token classification (independent B/I/O per type) |
-| Marker types | 53 (22 oral, 31 literate) |
-| Test macro F1 | **0.388** (per-type detection, binary positive = B or I) |
 | Training | 20 epochs, batch 24, lr 3e-5, fp16 |
 | Regularization | Mixout (p=0.1) — stochastic L2 anchor to pretrained weights |
 | Loss | Per-type weighted cross-entropy with inverse-frequency type weights |
 | Min examples | 150 (types below this threshold excluded) |
 ## Usage
 ```python
 import json
 import torch
@@ -81,16 +80,16 @@ for i, token in enumerate(tokens):
 - Types with fewer than 150 annotated spans are excluded from training
 - Multi-label BIO annotation: tokens can carry labels for multiple overlapping marker types simultaneously
-## Marker Types (53)
-### Oral Markers (22 types)
 Characteristics of oral tradition and spoken discourse:
 | Category | Markers |
 |----------|---------|
 | **Address & Interaction** | vocative, imperative, second_person, inclusive_we, rhetorical_question, phatic_check, phatic_filler |
-| **Repetition & Pattern** | anaphora, parallelism, tricolon, lexical_repetition, antithesis |
 | **Conjunction** | simple_conjunction |
 | **Formulas** | discourse_formula, intensifier_doubling |
 | **Narrative** | named_individual, specific_place, temporal_anchor, sensory_detail, embodied_action, everyday_example |
@@ -119,71 +118,71 @@ Per-type detection F1 on test set (binary: B or I = positive, O = negative):
 ```
 Type                                            Prec    Rec     F1    Sup
 ========================================================================
-literate_abstract_noun                         0.119  0.114  0.116    466
-literate_additive_formal                       0.225  0.576  0.323     85
-literate_agent_demoted                         0.345  0.670  0.455    288
-literate_agentless_passive                     0.399  0.750  0.521   1286
-literate_aside                                 0.399  0.599  0.479    461
-literate_categorical_statement                 0.191  0.277  0.226    393
-literate_causal_explicit                       0.285  0.370  0.322    376
-literate_citation                              0.515  0.671  0.582    237
-literate_conceptual_metaphor                   0.172  0.387  0.238    222
-literate_concessive                            0.475  0.596  0.529    740
-literate_concessive_connector                  0.107  0.514  0.178     37
-literate_concrete_setting                      0.189  0.462  0.269    292
-literate_conditional                           0.511  0.823  0.631   1609
-literate_contrastive                           0.310  0.460  0.370    383
-literate_cross_reference                       0.390  0.366  0.377     82
-literate_definitional_move                     0.288  0.515  0.370     66
-literate_enumeration                           0.285  0.743  0.412    855
-literate_epistemic_hedge                       0.339  0.564  0.424    541
-literate_evidential                            0.323  0.630  0.427    162
-literate_institutional_subject                 0.237  0.532  0.328    250
-literate_list_structure                        0.795  0.529  0.635    652
-literate_metadiscourse                         0.243  0.446  0.314    361
-literate_nested_clauses                        0.148  0.398  0.216   1271
-literate_nominalization                        0.241  0.490  0.323   1140
-literate_objectifying_stance                   0.474  0.469  0.471    192
-literate_probability                           0.572  0.728  0.641    114
-literate_qualified_assertion                   0.132  0.163  0.146    123
-literate_relative_chain                        0.282  0.572  0.378   1753
-literate_technical_abbreviation                0.381  0.773  0.510    132
-literate_technical_term                        0.264  0.481  0.341    908
-literate_temporal_embedding                    0.187  0.318  0.235    550
-oral_anaphora                                  0.120  0.348  0.179    141
-oral_antithesis                                0.213  0.249  0.230    453
-oral_discourse_formula                         0.287  0.432  0.345    570
-oral_embodied_action                           0.247  0.430  0.314    465
-oral_everyday_example                          0.263  0.411  0.320    358
-oral_imperative                                0.402  0.787  0.532    211
-oral_inclusive_we                              0.485  0.819  0.609    747
-oral_intensifier_doubling                      0.291  0.316  0.303     79
-oral_lexical_repetition                        0.331  0.550  0.414    218
-oral_named_individual                          0.386  0.708  0.500    818
-oral_parallelism                               0.674  0.041  0.077    710
-oral_phatic_check                              0.432  0.829  0.568     76
-oral_phatic_filler                             0.340  0.630  0.442    184
-oral_rhetorical_question                       0.587  0.899  0.710   1276
-oral_second_person                             0.421  0.610  0.498    839
-oral_self_correction                           0.479  0.372  0.419    156
-oral_sensory_detail                            0.249  0.452  0.321    367
-oral_simple_conjunction                        0.096  0.343  0.150     70
-oral_specific_place                            0.396  0.717  0.510    367
-oral_temporal_anchor                           0.347  0.831  0.490    555
-oral_tricolon                                  0.217  0.220  0.218    560
-oral_vocative                                  0.505  0.759  0.607    133
 ========================================================================
-Macro avg (types w/ support)                                 0.388
 ```
 </details>
-**Missing labels (test set):** 0/53 — all types detected at least once.
 Notable patterns:
-- **Strong performers** (F1 > 0.5): rhetorical_question (0.710), probability (0.641), list_structure (0.635), conditional (0.631), inclusive_we (0.609), vocative (0.607), citation (0.582), phatic_check (0.568)
-- **Weak performers** (F1 < 0.2): parallelism (0.077), simple_conjunction (0.150), abstract_noun (0.116), qualified_assertion (0.146), concessive_connector (0.178), anaphora (0.179)
-- **Precision-recall tradeoff**: Most types show higher recall than precision, indicating the model over-predicts rather than under-predicts markers
 ## Architecture
@@ -215,8 +214,9 @@ classifier.bias    → randomly initialized
 ## Limitations
-- **Low-precision types**: Several types show precision below 0.2, meaning most predictions for those types are false positives
-- **Parallelism collapse**: `oral_parallelism` has high precision (0.674) but near-zero recall (0.041), suggesting the model learned a very narrow pattern
 - **Context window**: 128 tokens max; longer spans may be truncated
 - **Domain**: Trained primarily on historical/literary texts; may underperform on modern social media
 - **Subjectivity**: Some marker boundaries are inherently ambiguous
@@ -238,4 +238,4 @@ classifier.bias    → randomly initialized
 ---
-*Trained: February 2026*

 BERT-based token classifier for detecting **oral and literate markers** in text, based on Walter Ong's "Orality and Literacy" (1982).
+This model performs multi-label span-level detection of 52 rhetorical marker types, where each token independently carries B/I/O labels per type — allowing overlapping spans (e.g. a token that is simultaneously part of a concessive and a nested clause).
 ## Model Details
 |----------|-------|
 | Base model | `bert-base-uncased` |
 | Task | Multi-label token classification (independent B/I/O per type) |
+| Marker types | 52 (21 oral, 31 literate) |
+| Test macro F1 | **0.394** (per-type detection, binary positive = B or I) |
 | Training | 20 epochs, batch 24, lr 3e-5, fp16 |
 | Regularization | Mixout (p=0.1) — stochastic L2 anchor to pretrained weights |
 | Loss | Per-type weighted cross-entropy with inverse-frequency type weights |
 | Min examples | 150 (types below this threshold excluded) |
 ## Usage
 ```python
 import json
 import torch
 - Types with fewer than 150 annotated spans are excluded from training
 - Multi-label BIO annotation: tokens can carry labels for multiple overlapping marker types simultaneously
+## Marker Types (52)
+### Oral Markers (21 types)
 Characteristics of oral tradition and spoken discourse:
 | Category | Markers |
 |----------|---------|
 | **Address & Interaction** | vocative, imperative, second_person, inclusive_we, rhetorical_question, phatic_check, phatic_filler |
+| **Repetition & Pattern** | anaphora, tricolon, lexical_repetition, antithesis |
 | **Conjunction** | simple_conjunction |
 | **Formulas** | discourse_formula, intensifier_doubling |
 | **Narrative** | named_individual, specific_place, temporal_anchor, sensory_detail, embodied_action, everyday_example |
 ```
 Type                                            Prec    Rec     F1    Sup
 ========================================================================
+literate_abstract_noun                         0.283  0.036  0.064    474
+literate_additive_formal                       0.458  0.388  0.420     85
+literate_agent_demoted                         0.495  0.569  0.530    288
+literate_agentless_passive                     0.659  0.592  0.624   1285
+literate_aside                                 0.468  0.524  0.494    481
+literate_categorical_statement                 0.256  0.141  0.182    389
+literate_causal_explicit                       0.457  0.196  0.275    382
+literate_citation                              0.624  0.539  0.578    243
+literate_conceptual_metaphor                   0.366  0.242  0.291    219
+literate_concessive                            0.558  0.290  0.382    742
+literate_concessive_connector                  0.286  0.324  0.304     37
+literate_concrete_setting                      0.222  0.132  0.166    303
+literate_conditional                           0.664  0.597  0.629   1642
+literate_contrastive                           0.481  0.227  0.308    388
+literate_cross_reference                       0.644  0.326  0.433     89
+literate_definitional_move                     0.279  0.284  0.281     67
+literate_enumeration                           0.507  0.580  0.541    855
+literate_epistemic_hedge                       0.523  0.405  0.456    543
+literate_evidential                            0.487  0.457  0.471    162
+literate_institutional_subject                 0.330  0.274  0.300    248
+literate_list_structure                        0.929  0.464  0.619    653
+literate_metadiscourse                         0.355  0.251  0.294    355
+literate_nested_clauses                        0.212  0.140  0.169   1250
+literate_nominalization                        0.527  0.397  0.453   1147
+literate_objectifying_stance                   0.593  0.400  0.478    200
+literate_probability                           0.740  0.544  0.627    136
+literate_qualified_assertion                   0.153  0.073  0.099    123
+literate_relative_chain                        0.333  0.179  0.233   1717
+literate_technical_abbreviation                0.613  0.725  0.665    153
+literate_technical_term                        0.490  0.311  0.381    897
+literate_temporal_embedding                    0.210  0.143  0.170    553
+oral_anaphora                                  0.205  0.128  0.157    141
+oral_antithesis                                0.389  0.181  0.247    453
+oral_discourse_formula                         0.557  0.173  0.263    568
+oral_embodied_action                           0.421  0.213  0.283    489
+oral_everyday_example                          0.219  0.209  0.214    358
+oral_imperative                                0.537  0.695  0.606    200
+oral_inclusive_we                              0.616  0.599  0.608    751
+oral_intensifier_doubling                      0.632  0.152  0.245     79
+oral_lexical_repetition                        0.406  0.468  0.435    218
+oral_named_individual                          0.535  0.566  0.550    813
+oral_phatic_check                              0.591  0.684  0.634     76
+oral_phatic_filler                             0.469  0.524  0.495    189
+oral_rhetorical_question                       0.677  0.646  0.661   1273
+oral_second_person                             0.618  0.493  0.549    842
+oral_self_correction                           0.582  0.205  0.303    156
+oral_sensory_detail                            0.281  0.247  0.263    352
+oral_simple_conjunction                        0.146  0.085  0.107     71
+oral_specific_place                            0.534  0.582  0.557    373
+oral_temporal_anchor                           0.518  0.510  0.514    563
+oral_tricolon                                  0.247  0.185  0.212    562
+oral_vocative                                  0.667  0.684  0.675    158
 ========================================================================
+Macro avg (types w/ support)                                 0.394
 ```
 </details>
+**Missing labels (test set):** 0/52 — all types detected at least once.
 Notable patterns:
+- **Strong performers** (F1 > 0.5): vocative (0.675), technical_abbreviation (0.665), rhetorical_question (0.661), phatic_check (0.634), conditional (0.629), probability (0.627), agentless_passive (0.624), list_structure (0.619), inclusive_we (0.608), imperative (0.606), citation (0.578), specific_place (0.557), named_individual (0.550), second_person (0.549), enumeration (0.541), agent_demoted (0.530), temporal_anchor (0.514)
+- **Weak performers** (F1 < 0.2): abstract_noun (0.064), qualified_assertion (0.099), simple_conjunction (0.107), anaphora (0.157), concrete_setting (0.166), nested_clauses (0.169), temporal_embedding (0.170), categorical_statement (0.182)
+- **Precision-recall tradeoff**: Most types now show higher precision than recall, indicating the model under-predicts rather than over-predicts markers (reversed from the previous release)
+- **Dropped type**: `oral_parallelism` was excluded from this training run (fell below the 150-span minimum threshold)
 ## Architecture
 ## Limitations
+- **Low-precision types**: Several types show precision below 0.25, meaning most predictions for those types are false positives
+- **Low-recall types**: `abstract_noun` (0.036 recall), `simple_conjunction` (0.085), and `qualified_assertion` (0.073) are near-invisible to the model despite nonzero precision
+- **Excluded type**: `oral_parallelism` fell below the 150-span minimum and was excluded; structural parallelism remains undetected
 - **Context window**: 128 tokens max; longer spans may be truncated
 - **Domain**: Trained primarily on historical/literary texts; may underperform on modern social media
 - **Subjectivity**: Some marker boundaries are inherently ambiguous
 ---
+*Trained: February 2026*

config.json CHANGED Viewed

@@ -1,9 +1,12 @@
 {
   "add_cross_attention": false,
   "architectures": [
-    "BertModel"
   ],
   "attention_probs_dropout_prob": 0.1,
   "bos_token_id": null,
   "classifier_dropout": null,
   "dtype": "float32",
@@ -12,43 +15,76 @@
   "hidden_act": "gelu",
   "hidden_dropout_prob": 0.1,
   "hidden_size": 768,
-  "initializer_range": 0.02,
-  "intermediate_size": 3072,
-  "is_decoder": false,
-  "layer_norm_eps": 1e-12,
-  "max_position_embeddings": 512,
-  "model_type": "bert",
-  "num_attention_heads": 12,
-  "num_hidden_layers": 12,
-  "pad_token_id": 0,
-  "position_embedding_type": "absolute",
-  "tie_word_embeddings": true,
-  "transformers_version": "5.0.0",
-  "type_vocab_size": 2,
-  "use_cache": true,
-  "vocab_size": 30522,
-  "num_labels": 159,
   "id2label": {
     "0": "O-literate_abstract_noun",
     "1": "B-literate_abstract_noun",
-    "2": "I-literate_abstract_noun",
-    "3": "O-literate_additive_formal",
-    "4": "B-literate_additive_formal",
-    "5": "I-literate_additive_formal",
-    "6": "O-literate_agent_demoted",
-    "7": "B-literate_agent_demoted",
-    "8": "I-literate_agent_demoted",
-    "9": "O-literate_agentless_passive",
     "10": "B-literate_agentless_passive",
     "11": "I-literate_agentless_passive",
     "12": "O-literate_aside",
     "13": "B-literate_aside",
     "14": "I-literate_aside",
     "15": "O-literate_categorical_statement",
     "16": "B-literate_categorical_statement",
     "17": "I-literate_categorical_statement",
     "18": "O-literate_causal_explicit",
     "19": "B-literate_causal_explicit",
     "20": "I-literate_causal_explicit",
     "21": "O-literate_citation",
     "22": "B-literate_citation",
@@ -59,6 +95,7 @@
     "27": "O-literate_concessive",
     "28": "B-literate_concessive",
     "29": "I-literate_concessive",
     "30": "O-literate_concessive_connector",
     "31": "B-literate_concessive_connector",
     "32": "I-literate_concessive_connector",
@@ -69,6 +106,7 @@
     "37": "B-literate_conditional",
     "38": "I-literate_conditional",
     "39": "O-literate_contrastive",
     "40": "B-literate_contrastive",
     "41": "I-literate_contrastive",
     "42": "O-literate_cross_reference",
@@ -79,6 +117,7 @@
     "47": "I-literate_definitional_move",
     "48": "O-literate_enumeration",
     "49": "B-literate_enumeration",
     "50": "I-literate_enumeration",
     "51": "O-literate_epistemic_hedge",
     "52": "B-literate_epistemic_hedge",
@@ -89,6 +128,7 @@
     "57": "O-literate_institutional_subject",
     "58": "B-literate_institutional_subject",
     "59": "I-literate_institutional_subject",
     "60": "O-literate_list_structure",
     "61": "B-literate_list_structure",
     "62": "I-literate_list_structure",
@@ -99,6 +139,7 @@
     "67": "B-literate_nested_clauses",
     "68": "I-literate_nested_clauses",
     "69": "O-literate_nominalization",
     "70": "B-literate_nominalization",
     "71": "I-literate_nominalization",
     "72": "O-literate_objectifying_stance",
@@ -109,6 +150,7 @@
     "77": "I-literate_probability",
     "78": "O-literate_qualified_assertion",
     "79": "B-literate_qualified_assertion",
     "80": "I-literate_qualified_assertion",
     "81": "O-literate_relative_chain",
     "82": "B-literate_relative_chain",
@@ -119,6 +161,7 @@
     "87": "O-literate_technical_term",
     "88": "B-literate_technical_term",
     "89": "I-literate_technical_term",
     "90": "O-literate_temporal_embedding",
     "91": "B-literate_temporal_embedding",
     "92": "I-literate_temporal_embedding",
@@ -128,230 +171,180 @@
     "96": "O-oral_antithesis",
     "97": "B-oral_antithesis",
     "98": "I-oral_antithesis",
-    "99": "O-oral_discourse_formula",
-    "100": "B-oral_discourse_formula",
-    "101": "I-oral_discourse_formula",
-    "102": "O-oral_embodied_action",
-    "103": "B-oral_embodied_action",
-    "104": "I-oral_embodied_action",
-    "105": "O-oral_everyday_example",
-    "106": "B-oral_everyday_example",
-    "107": "I-oral_everyday_example",
-    "108": "O-oral_imperative",
-    "109": "B-oral_imperative",
-    "110": "I-oral_imperative",
-    "111": "O-oral_inclusive_we",
-    "112": "B-oral_inclusive_we",
-    "113": "I-oral_inclusive_we",
-    "114": "O-oral_intensifier_doubling",
-    "115": "B-oral_intensifier_doubling",
-    "116": "I-oral_intensifier_doubling",
-    "117": "O-oral_lexical_repetition",
-    "118": "B-oral_lexical_repetition",
-    "119": "I-oral_lexical_repetition",
-    "120": "O-oral_named_individual",
-    "121": "B-oral_named_individual",
-    "122": "I-oral_named_individual",
-    "123": "O-oral_parallelism",
-    "124": "B-oral_parallelism",
-    "125": "I-oral_parallelism",
-    "126": "O-oral_phatic_check",
-    "127": "B-oral_phatic_check",
-    "128": "I-oral_phatic_check",
-    "129": "O-oral_phatic_filler",
-    "130": "B-oral_phatic_filler",
-    "131": "I-oral_phatic_filler",
-    "132": "O-oral_rhetorical_question",
-    "133": "B-oral_rhetorical_question",
-    "134": "I-oral_rhetorical_question",
-    "135": "O-oral_second_person",
-    "136": "B-oral_second_person",
-    "137": "I-oral_second_person",
-    "138": "O-oral_self_correction",
-    "139": "B-oral_self_correction",
-    "140": "I-oral_self_correction",
-    "141": "O-oral_sensory_detail",
-    "142": "B-oral_sensory_detail",
-    "143": "I-oral_sensory_detail",
-    "144": "O-oral_simple_conjunction",
-    "145": "B-oral_simple_conjunction",
-    "146": "I-oral_simple_conjunction",
-    "147": "O-oral_specific_place",
-    "148": "B-oral_specific_place",
-    "149": "I-oral_specific_place",
-    "150": "O-oral_temporal_anchor",
-    "151": "B-oral_temporal_anchor",
-    "152": "I-oral_temporal_anchor",
-    "153": "O-oral_tricolon",
-    "154": "B-oral_tricolon",
-    "155": "I-oral_tricolon",
-    "156": "O-oral_vocative",
-    "157": "B-oral_vocative",
-    "158": "I-oral_vocative"
   },
   "label2id": {
-    "O-literate_abstract_noun": 0,
     "B-literate_abstract_noun": 1,
-    "I-literate_abstract_noun": 2,
-    "O-literate_additive_formal": 3,
     "B-literate_additive_formal": 4,
-    "I-literate_additive_formal": 5,
-    "O-literate_agent_demoted": 6,
     "B-literate_agent_demoted": 7,
-    "I-literate_agent_demoted": 8,
-    "O-literate_agentless_passive": 9,
     "B-literate_agentless_passive": 10,
-    "I-literate_agentless_passive": 11,
-    "O-literate_aside": 12,
     "B-literate_aside": 13,
-    "I-literate_aside": 14,
-    "O-literate_categorical_statement": 15,
     "B-literate_categorical_statement": 16,
-    "I-literate_categorical_statement": 17,
-    "O-literate_causal_explicit": 18,
     "B-literate_causal_explicit": 19,
-    "I-literate_causal_explicit": 20,
-    "O-literate_citation": 21,
     "B-literate_citation": 22,
-    "I-literate_citation": 23,
-    "O-literate_conceptual_metaphor": 24,
     "B-literate_conceptual_metaphor": 25,
-    "I-literate_conceptual_metaphor": 26,
-    "O-literate_concessive": 27,
     "B-literate_concessive": 28,
-    "I-literate_concessive": 29,
-    "O-literate_concessive_connector": 30,
     "B-literate_concessive_connector": 31,
-    "I-literate_concessive_connector": 32,
-    "O-literate_concrete_setting": 33,
     "B-literate_concrete_setting": 34,
-    "I-literate_concrete_setting": 35,
-    "O-literate_conditional": 36,
     "B-literate_conditional": 37,
-    "I-literate_conditional": 38,
-    "O-literate_contrastive": 39,
     "B-literate_contrastive": 40,
-    "I-literate_contrastive": 41,
-    "O-literate_cross_reference": 42,
     "B-literate_cross_reference": 43,
-    "I-literate_cross_reference": 44,
-    "O-literate_definitional_move": 45,
     "B-literate_definitional_move": 46,
-    "I-literate_definitional_move": 47,
-    "O-literate_enumeration": 48,
     "B-literate_enumeration": 49,
-    "I-literate_enumeration": 50,
-    "O-literate_epistemic_hedge": 51,
     "B-literate_epistemic_hedge": 52,
-    "I-literate_epistemic_hedge": 53,
-    "O-literate_evidential": 54,
     "B-literate_evidential": 55,
-    "I-literate_evidential": 56,
-    "O-literate_institutional_subject": 57,
     "B-literate_institutional_subject": 58,
-    "I-literate_institutional_subject": 59,
-    "O-literate_list_structure": 60,
     "B-literate_list_structure": 61,
-    "I-literate_list_structure": 62,
-    "O-literate_metadiscourse": 63,
     "B-literate_metadiscourse": 64,
-    "I-literate_metadiscourse": 65,
-    "O-literate_nested_clauses": 66,
     "B-literate_nested_clauses": 67,
-    "I-literate_nested_clauses": 68,
-    "O-literate_nominalization": 69,
     "B-literate_nominalization": 70,
-    "I-literate_nominalization": 71,
-    "O-literate_objectifying_stance": 72,
     "B-literate_objectifying_stance": 73,
-    "I-literate_objectifying_stance": 74,
-    "O-literate_probability": 75,
     "B-literate_probability": 76,
-    "I-literate_probability": 77,
-    "O-literate_qualified_assertion": 78,
     "B-literate_qualified_assertion": 79,
-    "I-literate_qualified_assertion": 80,
-    "O-literate_relative_chain": 81,
     "B-literate_relative_chain": 82,
-    "I-literate_relative_chain": 83,
-    "O-literate_technical_abbreviation": 84,
     "B-literate_technical_abbreviation": 85,
-    "I-literate_technical_abbreviation": 86,
-    "O-literate_technical_term": 87,
     "B-literate_technical_term": 88,
-    "I-literate_technical_term": 89,
-    "O-literate_temporal_embedding": 90,
     "B-literate_temporal_embedding": 91,
-    "I-literate_temporal_embedding": 92,
-    "O-oral_anaphora": 93,
     "B-oral_anaphora": 94,
-    "I-oral_anaphora": 95,
-    "O-oral_antithesis": 96,
     "B-oral_antithesis": 97,
-    "I-oral_antithesis": 98,
-    "O-oral_discourse_formula": 99,
     "B-oral_discourse_formula": 100,
-    "I-oral_discourse_formula": 101,
-    "O-oral_embodied_action": 102,
     "B-oral_embodied_action": 103,
-    "I-oral_embodied_action": 104,
-    "O-oral_everyday_example": 105,
     "B-oral_everyday_example": 106,
-    "I-oral_everyday_example": 107,
-    "O-oral_imperative": 108,
     "B-oral_imperative": 109,
-    "I-oral_imperative": 110,
-    "O-oral_inclusive_we": 111,
     "B-oral_inclusive_we": 112,
-    "I-oral_inclusive_we": 113,
-    "O-oral_intensifier_doubling": 114,
     "B-oral_intensifier_doubling": 115,
-    "I-oral_intensifier_doubling": 116,
-    "O-oral_lexical_repetition": 117,
     "B-oral_lexical_repetition": 118,
-    "I-oral_lexical_repetition": 119,
-    "O-oral_named_individual": 120,
     "B-oral_named_individual": 121,
     "I-oral_named_individual": 122,
-    "O-oral_parallelism": 123,
-    "B-oral_parallelism": 124,
-    "I-oral_parallelism": 125,
-    "O-oral_phatic_check": 126,
-    "B-oral_phatic_check": 127,
-    "I-oral_phatic_check": 128,
-    "O-oral_phatic_filler": 129,
-    "B-oral_phatic_filler": 130,
-    "I-oral_phatic_filler": 131,
-    "O-oral_rhetorical_question": 132,
-    "B-oral_rhetorical_question": 133,
-    "I-oral_rhetorical_question": 134,
-    "O-oral_second_person": 135,
-    "B-oral_second_person": 136,
-    "I-oral_second_person": 137,
-    "O-oral_self_correction": 138,
-    "B-oral_self_correction": 139,
-    "I-oral_self_correction": 140,
-    "O-oral_sensory_detail": 141,
-    "B-oral_sensory_detail": 142,
-    "I-oral_sensory_detail": 143,
-    "O-oral_simple_conjunction": 144,
-    "B-oral_simple_conjunction": 145,
-    "I-oral_simple_conjunction": 146,
-    "O-oral_specific_place": 147,
-    "B-oral_specific_place": 148,
-    "I-oral_specific_place": 149,
-    "O-oral_temporal_anchor": 150,
-    "B-oral_temporal_anchor": 151,
-    "I-oral_temporal_anchor": 152,
-    "O-oral_tricolon": 153,
-    "B-oral_tricolon": 154,
-    "I-oral_tricolon": 155,
-    "O-oral_vocative": 156,
-    "B-oral_vocative": 157,
-    "I-oral_vocative": 158
   },
-  "num_types": 53,
-  "auto_map": {
-    "AutoModel": "modeling_havelock.HavelockTokenClassifier"
-  }
-}

 {
   "add_cross_attention": false,
   "architectures": [
+    "BertForMaskedLM"
   ],
   "attention_probs_dropout_prob": 0.1,
+  "auto_map": {
+    "AutoModel": "modeling_havelock.HavelockTokenClassifier"
+  },
   "bos_token_id": null,
   "classifier_dropout": null,
   "dtype": "float32",
   "hidden_act": "gelu",
   "hidden_dropout_prob": 0.1,
   "hidden_size": 768,
   "id2label": {
     "0": "O-literate_abstract_noun",
     "1": "B-literate_abstract_noun",
     "10": "B-literate_agentless_passive",
+    "100": "B-oral_discourse_formula",
+    "101": "I-oral_discourse_formula",
+    "102": "O-oral_embodied_action",
+    "103": "B-oral_embodied_action",
+    "104": "I-oral_embodied_action",
+    "105": "O-oral_everyday_example",
+    "106": "B-oral_everyday_example",
+    "107": "I-oral_everyday_example",
+    "108": "O-oral_imperative",
+    "109": "B-oral_imperative",
     "11": "I-literate_agentless_passive",
+    "110": "I-oral_imperative",
+    "111": "O-oral_inclusive_we",
+    "112": "B-oral_inclusive_we",
+    "113": "I-oral_inclusive_we",
+    "114": "O-oral_intensifier_doubling",
+    "115": "B-oral_intensifier_doubling",
+    "116": "I-oral_intensifier_doubling",
+    "117": "O-oral_lexical_repetition",
+    "118": "B-oral_lexical_repetition",
+    "119": "I-oral_lexical_repetition",
     "12": "O-literate_aside",
+    "120": "O-oral_named_individual",
+    "121": "B-oral_named_individual",
+    "122": "I-oral_named_individual",
+    "123": "O-oral_phatic_check",
+    "124": "B-oral_phatic_check",
+    "125": "I-oral_phatic_check",
+    "126": "O-oral_phatic_filler",
+    "127": "B-oral_phatic_filler",
+    "128": "I-oral_phatic_filler",
+    "129": "O-oral_rhetorical_question",
     "13": "B-literate_aside",
+    "130": "B-oral_rhetorical_question",
+    "131": "I-oral_rhetorical_question",
+    "132": "O-oral_second_person",
+    "133": "B-oral_second_person",
+    "134": "I-oral_second_person",
+    "135": "O-oral_self_correction",
+    "136": "B-oral_self_correction",
+    "137": "I-oral_self_correction",
+    "138": "O-oral_sensory_detail",
+    "139": "B-oral_sensory_detail",
     "14": "I-literate_aside",
+    "140": "I-oral_sensory_detail",
+    "141": "O-oral_simple_conjunction",
+    "142": "B-oral_simple_conjunction",
+    "143": "I-oral_simple_conjunction",
+    "144": "O-oral_specific_place",
+    "145": "B-oral_specific_place",
+    "146": "I-oral_specific_place",
+    "147": "O-oral_temporal_anchor",
+    "148": "B-oral_temporal_anchor",
+    "149": "I-oral_temporal_anchor",
     "15": "O-literate_categorical_statement",
+    "150": "O-oral_tricolon",
+    "151": "B-oral_tricolon",
+    "152": "I-oral_tricolon",
+    "153": "O-oral_vocative",
+    "154": "B-oral_vocative",
+    "155": "I-oral_vocative",
     "16": "B-literate_categorical_statement",
     "17": "I-literate_categorical_statement",
     "18": "O-literate_causal_explicit",
     "19": "B-literate_causal_explicit",
+    "2": "I-literate_abstract_noun",
     "20": "I-literate_causal_explicit",
     "21": "O-literate_citation",
     "22": "B-literate_citation",
     "27": "O-literate_concessive",
     "28": "B-literate_concessive",
     "29": "I-literate_concessive",
+    "3": "O-literate_additive_formal",
     "30": "O-literate_concessive_connector",
     "31": "B-literate_concessive_connector",
     "32": "I-literate_concessive_connector",
     "37": "B-literate_conditional",
     "38": "I-literate_conditional",
     "39": "O-literate_contrastive",
+    "4": "B-literate_additive_formal",
     "40": "B-literate_contrastive",
     "41": "I-literate_contrastive",
     "42": "O-literate_cross_reference",
     "47": "I-literate_definitional_move",
     "48": "O-literate_enumeration",
     "49": "B-literate_enumeration",
+    "5": "I-literate_additive_formal",
     "50": "I-literate_enumeration",
     "51": "O-literate_epistemic_hedge",
     "52": "B-literate_epistemic_hedge",
     "57": "O-literate_institutional_subject",
     "58": "B-literate_institutional_subject",
     "59": "I-literate_institutional_subject",
+    "6": "O-literate_agent_demoted",
     "60": "O-literate_list_structure",
     "61": "B-literate_list_structure",
     "62": "I-literate_list_structure",
     "67": "B-literate_nested_clauses",
     "68": "I-literate_nested_clauses",
     "69": "O-literate_nominalization",
+    "7": "B-literate_agent_demoted",
     "70": "B-literate_nominalization",
     "71": "I-literate_nominalization",
     "72": "O-literate_objectifying_stance",
     "77": "I-literate_probability",
     "78": "O-literate_qualified_assertion",
     "79": "B-literate_qualified_assertion",
+    "8": "I-literate_agent_demoted",
     "80": "I-literate_qualified_assertion",
     "81": "O-literate_relative_chain",
     "82": "B-literate_relative_chain",
     "87": "O-literate_technical_term",
     "88": "B-literate_technical_term",
     "89": "I-literate_technical_term",
+    "9": "O-literate_agentless_passive",
     "90": "O-literate_temporal_embedding",
     "91": "B-literate_temporal_embedding",
     "92": "I-literate_temporal_embedding",
     "96": "O-oral_antithesis",
     "97": "B-oral_antithesis",
     "98": "I-oral_antithesis",
+    "99": "O-oral_discourse_formula"
   },
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "is_decoder": false,
   "label2id": {
     "B-literate_abstract_noun": 1,
     "B-literate_additive_formal": 4,
     "B-literate_agent_demoted": 7,
     "B-literate_agentless_passive": 10,
     "B-literate_aside": 13,
     "B-literate_categorical_statement": 16,
     "B-literate_causal_explicit": 19,
     "B-literate_citation": 22,
     "B-literate_conceptual_metaphor": 25,
     "B-literate_concessive": 28,
     "B-literate_concessive_connector": 31,
     "B-literate_concrete_setting": 34,
     "B-literate_conditional": 37,
     "B-literate_contrastive": 40,
     "B-literate_cross_reference": 43,
     "B-literate_definitional_move": 46,
     "B-literate_enumeration": 49,
     "B-literate_epistemic_hedge": 52,
     "B-literate_evidential": 55,
     "B-literate_institutional_subject": 58,
     "B-literate_list_structure": 61,
     "B-literate_metadiscourse": 64,
     "B-literate_nested_clauses": 67,
     "B-literate_nominalization": 70,
     "B-literate_objectifying_stance": 73,
     "B-literate_probability": 76,
     "B-literate_qualified_assertion": 79,
     "B-literate_relative_chain": 82,
     "B-literate_technical_abbreviation": 85,
     "B-literate_technical_term": 88,
     "B-literate_temporal_embedding": 91,
     "B-oral_anaphora": 94,
     "B-oral_antithesis": 97,
     "B-oral_discourse_formula": 100,
     "B-oral_embodied_action": 103,
     "B-oral_everyday_example": 106,
     "B-oral_imperative": 109,
     "B-oral_inclusive_we": 112,
     "B-oral_intensifier_doubling": 115,
     "B-oral_lexical_repetition": 118,
     "B-oral_named_individual": 121,
+    "B-oral_phatic_check": 124,
+    "B-oral_phatic_filler": 127,
+    "B-oral_rhetorical_question": 130,
+    "B-oral_second_person": 133,
+    "B-oral_self_correction": 136,
+    "B-oral_sensory_detail": 139,
+    "B-oral_simple_conjunction": 142,
+    "B-oral_specific_place": 145,
+    "B-oral_temporal_anchor": 148,
+    "B-oral_tricolon": 151,
+    "B-oral_vocative": 154,
+    "I-literate_abstract_noun": 2,
+    "I-literate_additive_formal": 5,
+    "I-literate_agent_demoted": 8,
+    "I-literate_agentless_passive": 11,
+    "I-literate_aside": 14,
+    "I-literate_categorical_statement": 17,
+    "I-literate_causal_explicit": 20,
+    "I-literate_citation": 23,
+    "I-literate_conceptual_metaphor": 26,
+    "I-literate_concessive": 29,
+    "I-literate_concessive_connector": 32,
+    "I-literate_concrete_setting": 35,
+    "I-literate_conditional": 38,
+    "I-literate_contrastive": 41,
+    "I-literate_cross_reference": 44,
+    "I-literate_definitional_move": 47,
+    "I-literate_enumeration": 50,
+    "I-literate_epistemic_hedge": 53,
+    "I-literate_evidential": 56,
+    "I-literate_institutional_subject": 59,
+    "I-literate_list_structure": 62,
+    "I-literate_metadiscourse": 65,
+    "I-literate_nested_clauses": 68,
+    "I-literate_nominalization": 71,
+    "I-literate_objectifying_stance": 74,
+    "I-literate_probability": 77,
+    "I-literate_qualified_assertion": 80,
+    "I-literate_relative_chain": 83,
+    "I-literate_technical_abbreviation": 86,
+    "I-literate_technical_term": 89,
+    "I-literate_temporal_embedding": 92,
+    "I-oral_anaphora": 95,
+    "I-oral_antithesis": 98,
+    "I-oral_discourse_formula": 101,
+    "I-oral_embodied_action": 104,
+    "I-oral_everyday_example": 107,
+    "I-oral_imperative": 110,
+    "I-oral_inclusive_we": 113,
+    "I-oral_intensifier_doubling": 116,
+    "I-oral_lexical_repetition": 119,
     "I-oral_named_individual": 122,
+    "I-oral_phatic_check": 125,
+    "I-oral_phatic_filler": 128,
+    "I-oral_rhetorical_question": 131,
+    "I-oral_second_person": 134,
+    "I-oral_self_correction": 137,
+    "I-oral_sensory_detail": 140,
+    "I-oral_simple_conjunction": 143,
+    "I-oral_specific_place": 146,
+    "I-oral_temporal_anchor": 149,
+    "I-oral_tricolon": 152,
+    "I-oral_vocative": 155,
+    "O-literate_abstract_noun": 0,
+    "O-literate_additive_formal": 3,
+    "O-literate_agent_demoted": 6,
+    "O-literate_agentless_passive": 9,
+    "O-literate_aside": 12,
+    "O-literate_categorical_statement": 15,
+    "O-literate_causal_explicit": 18,
+    "O-literate_citation": 21,
+    "O-literate_conceptual_metaphor": 24,
+    "O-literate_concessive": 27,
+    "O-literate_concessive_connector": 30,
+    "O-literate_concrete_setting": 33,
+    "O-literate_conditional": 36,
+    "O-literate_contrastive": 39,
+    "O-literate_cross_reference": 42,
+    "O-literate_definitional_move": 45,
+    "O-literate_enumeration": 48,
+    "O-literate_epistemic_hedge": 51,
+    "O-literate_evidential": 54,
+    "O-literate_institutional_subject": 57,
+    "O-literate_list_structure": 60,
+    "O-literate_metadiscourse": 63,
+    "O-literate_nested_clauses": 66,
+    "O-literate_nominalization": 69,
+    "O-literate_objectifying_stance": 72,
+    "O-literate_probability": 75,
+    "O-literate_qualified_assertion": 78,
+    "O-literate_relative_chain": 81,
+    "O-literate_technical_abbreviation": 84,
+    "O-literate_technical_term": 87,
+    "O-literate_temporal_embedding": 90,
+    "O-oral_anaphora": 93,
+    "O-oral_antithesis": 96,
+    "O-oral_discourse_formula": 99,
+    "O-oral_embodied_action": 102,
+    "O-oral_everyday_example": 105,
+    "O-oral_imperative": 108,
+    "O-oral_inclusive_we": 111,
+    "O-oral_intensifier_doubling": 114,
+    "O-oral_lexical_repetition": 117,
+    "O-oral_named_individual": 120,
+    "O-oral_phatic_check": 123,
+    "O-oral_phatic_filler": 126,
+    "O-oral_rhetorical_question": 129,
+    "O-oral_second_person": 132,
+    "O-oral_self_correction": 135,
+    "O-oral_sensory_detail": 138,
+    "O-oral_simple_conjunction": 141,
+    "O-oral_specific_place": 144,
+    "O-oral_temporal_anchor": 147,
+    "O-oral_tricolon": 150,
+    "O-oral_vocative": 153
   },
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "num_types": 52,
+  "pad_token_id": 0,
+  "position_embedding_type": "absolute",
+  "tie_word_embeddings": true,
+  "transformers_version": "5.0.0",
+  "type_vocab_size": 2,
+  "use_cache": true,
+  "vocab_size": 30522
+}

head_config.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
   "model_name": "bert-base-uncased",
-  "num_types": 53,
   "hidden_size": 768
 }

 {
   "model_name": "bert-base-uncased",
+  "num_types": 52,
   "hidden_size": 768
 }

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:934733db298a26d41556120f86ad64efcb49728be77efdcb14b66664a38a28af
-size 436078996

 version https://git-lfs.github.com/spec/v1
+oid sha256:e7f65c36ddbc7fa2756a9e31ff2735c9708f2d891e4d637519a90675c6aa7088
+size 436073152

modeling_havelock.py CHANGED Viewed

@@ -1,6 +1,5 @@
 """Custom multi-label token classifier for HuggingFace Hub."""
-import torch
 import torch.nn as nn
 from transformers import BertModel, BertPreTrainedModel

 """Custom multi-label token classifier for HuggingFace Hub."""
 import torch.nn as nn
 from transformers import BertModel, BertPreTrainedModel

type_to_idx.json CHANGED Viewed

@@ -40,16 +40,15 @@
   "oral_intensifier_doubling": 38,
   "oral_lexical_repetition": 39,
   "oral_named_individual": 40,
-  "oral_parallelism": 41,
-  "oral_phatic_check": 42,
-  "oral_phatic_filler": 43,
-  "oral_rhetorical_question": 44,
-  "oral_second_person": 45,
-  "oral_self_correction": 46,
-  "oral_sensory_detail": 47,
-  "oral_simple_conjunction": 48,
-  "oral_specific_place": 49,
-  "oral_temporal_anchor": 50,
-  "oral_tricolon": 51,
-  "oral_vocative": 52
 }

   "oral_intensifier_doubling": 38,
   "oral_lexical_repetition": 39,
   "oral_named_individual": 40,
+  "oral_phatic_check": 41,
+  "oral_phatic_filler": 42,
+  "oral_rhetorical_question": 43,
+  "oral_second_person": 44,
+  "oral_self_correction": 45,
+  "oral_sensory_detail": 46,
+  "oral_simple_conjunction": 47,
+  "oral_specific_place": 48,
+  "oral_temporal_anchor": 49,
+  "oral_tricolon": 50,
+  "oral_vocative": 51
 }