permutans commited on
Commit
d347dae
·
verified ·
1 Parent(s): eb2291d

Upload folder using huggingface_hub

Browse files
Files changed (6) hide show
  1. README.md +68 -68
  2. config.json +200 -207
  3. head_config.json +1 -1
  4. model.safetensors +2 -2
  5. modeling_havelock.py +0 -1
  6. type_to_idx.json +11 -12
README.md CHANGED
@@ -22,7 +22,7 @@ datasets:
22
 
23
  BERT-based token classifier for detecting **oral and literate markers** in text, based on Walter Ong's "Orality and Literacy" (1982).
24
 
25
- This model performs multi-label span-level detection of 53 rhetorical marker types, where each token independently carries B/I/O labels per type — allowing overlapping spans (e.g. a token that is simultaneously part of a concessive and a nested clause).
26
 
27
  ## Model Details
28
 
@@ -30,15 +30,14 @@ This model performs multi-label span-level detection of 53 rhetorical marker typ
30
  |----------|-------|
31
  | Base model | `bert-base-uncased` |
32
  | Task | Multi-label token classification (independent B/I/O per type) |
33
- | Marker types | 53 (22 oral, 31 literate) |
34
- | Test macro F1 | **0.388** (per-type detection, binary positive = B or I) |
35
  | Training | 20 epochs, batch 24, lr 3e-5, fp16 |
36
  | Regularization | Mixout (p=0.1) — stochastic L2 anchor to pretrained weights |
37
  | Loss | Per-type weighted cross-entropy with inverse-frequency type weights |
38
  | Min examples | 150 (types below this threshold excluded) |
39
 
40
  ## Usage
41
-
42
  ```python
43
  import json
44
  import torch
@@ -81,16 +80,16 @@ for i, token in enumerate(tokens):
81
  - Types with fewer than 150 annotated spans are excluded from training
82
  - Multi-label BIO annotation: tokens can carry labels for multiple overlapping marker types simultaneously
83
 
84
- ## Marker Types (53)
85
 
86
- ### Oral Markers (22 types)
87
 
88
  Characteristics of oral tradition and spoken discourse:
89
 
90
  | Category | Markers |
91
  |----------|---------|
92
  | **Address & Interaction** | vocative, imperative, second_person, inclusive_we, rhetorical_question, phatic_check, phatic_filler |
93
- | **Repetition & Pattern** | anaphora, parallelism, tricolon, lexical_repetition, antithesis |
94
  | **Conjunction** | simple_conjunction |
95
  | **Formulas** | discourse_formula, intensifier_doubling |
96
  | **Narrative** | named_individual, specific_place, temporal_anchor, sensory_detail, embodied_action, everyday_example |
@@ -119,71 +118,71 @@ Per-type detection F1 on test set (binary: B or I = positive, O = negative):
119
  ```
120
  Type Prec Rec F1 Sup
121
  ========================================================================
122
- literate_abstract_noun 0.119 0.114 0.116 466
123
- literate_additive_formal 0.225 0.576 0.323 85
124
- literate_agent_demoted 0.345 0.670 0.455 288
125
- literate_agentless_passive 0.399 0.750 0.521 1286
126
- literate_aside 0.399 0.599 0.479 461
127
- literate_categorical_statement 0.191 0.277 0.226 393
128
- literate_causal_explicit 0.285 0.370 0.322 376
129
- literate_citation 0.515 0.671 0.582 237
130
- literate_conceptual_metaphor 0.172 0.387 0.238 222
131
- literate_concessive 0.475 0.596 0.529 740
132
- literate_concessive_connector 0.107 0.514 0.178 37
133
- literate_concrete_setting 0.189 0.462 0.269 292
134
- literate_conditional 0.511 0.823 0.631 1609
135
- literate_contrastive 0.310 0.460 0.370 383
136
- literate_cross_reference 0.390 0.366 0.377 82
137
- literate_definitional_move 0.288 0.515 0.370 66
138
- literate_enumeration 0.285 0.743 0.412 855
139
- literate_epistemic_hedge 0.339 0.564 0.424 541
140
- literate_evidential 0.323 0.630 0.427 162
141
- literate_institutional_subject 0.237 0.532 0.328 250
142
- literate_list_structure 0.795 0.529 0.635 652
143
- literate_metadiscourse 0.243 0.446 0.314 361
144
- literate_nested_clauses 0.148 0.398 0.216 1271
145
- literate_nominalization 0.241 0.490 0.323 1140
146
- literate_objectifying_stance 0.474 0.469 0.471 192
147
- literate_probability 0.572 0.728 0.641 114
148
- literate_qualified_assertion 0.132 0.163 0.146 123
149
- literate_relative_chain 0.282 0.572 0.378 1753
150
- literate_technical_abbreviation 0.381 0.773 0.510 132
151
- literate_technical_term 0.264 0.481 0.341 908
152
- literate_temporal_embedding 0.187 0.318 0.235 550
153
- oral_anaphora 0.120 0.348 0.179 141
154
- oral_antithesis 0.213 0.249 0.230 453
155
- oral_discourse_formula 0.287 0.432 0.345 570
156
- oral_embodied_action 0.247 0.430 0.314 465
157
- oral_everyday_example 0.263 0.411 0.320 358
158
- oral_imperative 0.402 0.787 0.532 211
159
- oral_inclusive_we 0.485 0.819 0.609 747
160
- oral_intensifier_doubling 0.291 0.316 0.303 79
161
- oral_lexical_repetition 0.331 0.550 0.414 218
162
- oral_named_individual 0.386 0.708 0.500 818
163
- oral_parallelism 0.674 0.041 0.077 710
164
- oral_phatic_check 0.432 0.829 0.568 76
165
- oral_phatic_filler 0.340 0.630 0.442 184
166
- oral_rhetorical_question 0.587 0.899 0.710 1276
167
- oral_second_person 0.421 0.610 0.498 839
168
- oral_self_correction 0.479 0.372 0.419 156
169
- oral_sensory_detail 0.249 0.452 0.321 367
170
- oral_simple_conjunction 0.096 0.343 0.150 70
171
- oral_specific_place 0.396 0.717 0.510 367
172
- oral_temporal_anchor 0.347 0.831 0.490 555
173
- oral_tricolon 0.217 0.220 0.218 560
174
- oral_vocative 0.505 0.759 0.607 133
175
  ========================================================================
176
- Macro avg (types w/ support) 0.388
177
  ```
178
 
179
  </details>
180
 
181
- **Missing labels (test set):** 0/53 — all types detected at least once.
182
 
183
  Notable patterns:
184
- - **Strong performers** (F1 > 0.5): rhetorical_question (0.710), probability (0.641), list_structure (0.635), conditional (0.631), inclusive_we (0.609), vocative (0.607), citation (0.582), phatic_check (0.568)
185
- - **Weak performers** (F1 < 0.2): parallelism (0.077), simple_conjunction (0.150), abstract_noun (0.116), qualified_assertion (0.146), concessive_connector (0.178), anaphora (0.179)
186
- - **Precision-recall tradeoff**: Most types show higher recall than precision, indicating the model over-predicts rather than under-predicts markers
 
187
 
188
  ## Architecture
189
 
@@ -215,8 +214,9 @@ classifier.bias → randomly initialized
215
 
216
  ## Limitations
217
 
218
- - **Low-precision types**: Several types show precision below 0.2, meaning most predictions for those types are false positives
219
- - **Parallelism collapse**: `oral_parallelism` has high precision (0.674) but near-zero recall (0.041), suggesting the model learned a very narrow pattern
 
220
  - **Context window**: 128 tokens max; longer spans may be truncated
221
  - **Domain**: Trained primarily on historical/literary texts; may underperform on modern social media
222
  - **Subjectivity**: Some marker boundaries are inherently ambiguous
@@ -238,4 +238,4 @@ classifier.bias → randomly initialized
238
 
239
  ---
240
 
241
- *Trained: February 2026*
 
22
 
23
  BERT-based token classifier for detecting **oral and literate markers** in text, based on Walter Ong's "Orality and Literacy" (1982).
24
 
25
+ This model performs multi-label span-level detection of 52 rhetorical marker types, where each token independently carries B/I/O labels per type — allowing overlapping spans (e.g. a token that is simultaneously part of a concessive and a nested clause).
26
 
27
  ## Model Details
28
 
 
30
  |----------|-------|
31
  | Base model | `bert-base-uncased` |
32
  | Task | Multi-label token classification (independent B/I/O per type) |
33
+ | Marker types | 52 (21 oral, 31 literate) |
34
+ | Test macro F1 | **0.394** (per-type detection, binary positive = B or I) |
35
  | Training | 20 epochs, batch 24, lr 3e-5, fp16 |
36
  | Regularization | Mixout (p=0.1) — stochastic L2 anchor to pretrained weights |
37
  | Loss | Per-type weighted cross-entropy with inverse-frequency type weights |
38
  | Min examples | 150 (types below this threshold excluded) |
39
 
40
  ## Usage
 
41
  ```python
42
  import json
43
  import torch
 
80
  - Types with fewer than 150 annotated spans are excluded from training
81
  - Multi-label BIO annotation: tokens can carry labels for multiple overlapping marker types simultaneously
82
 
83
+ ## Marker Types (52)
84
 
85
+ ### Oral Markers (21 types)
86
 
87
  Characteristics of oral tradition and spoken discourse:
88
 
89
  | Category | Markers |
90
  |----------|---------|
91
  | **Address & Interaction** | vocative, imperative, second_person, inclusive_we, rhetorical_question, phatic_check, phatic_filler |
92
+ | **Repetition & Pattern** | anaphora, tricolon, lexical_repetition, antithesis |
93
  | **Conjunction** | simple_conjunction |
94
  | **Formulas** | discourse_formula, intensifier_doubling |
95
  | **Narrative** | named_individual, specific_place, temporal_anchor, sensory_detail, embodied_action, everyday_example |
 
118
  ```
119
  Type Prec Rec F1 Sup
120
  ========================================================================
121
+ literate_abstract_noun 0.283 0.036 0.064 474
122
+ literate_additive_formal 0.458 0.388 0.420 85
123
+ literate_agent_demoted 0.495 0.569 0.530 288
124
+ literate_agentless_passive 0.659 0.592 0.624 1285
125
+ literate_aside 0.468 0.524 0.494 481
126
+ literate_categorical_statement 0.256 0.141 0.182 389
127
+ literate_causal_explicit 0.457 0.196 0.275 382
128
+ literate_citation 0.624 0.539 0.578 243
129
+ literate_conceptual_metaphor 0.366 0.242 0.291 219
130
+ literate_concessive 0.558 0.290 0.382 742
131
+ literate_concessive_connector 0.286 0.324 0.304 37
132
+ literate_concrete_setting 0.222 0.132 0.166 303
133
+ literate_conditional 0.664 0.597 0.629 1642
134
+ literate_contrastive 0.481 0.227 0.308 388
135
+ literate_cross_reference 0.644 0.326 0.433 89
136
+ literate_definitional_move 0.279 0.284 0.281 67
137
+ literate_enumeration 0.507 0.580 0.541 855
138
+ literate_epistemic_hedge 0.523 0.405 0.456 543
139
+ literate_evidential 0.487 0.457 0.471 162
140
+ literate_institutional_subject 0.330 0.274 0.300 248
141
+ literate_list_structure 0.929 0.464 0.619 653
142
+ literate_metadiscourse 0.355 0.251 0.294 355
143
+ literate_nested_clauses 0.212 0.140 0.169 1250
144
+ literate_nominalization 0.527 0.397 0.453 1147
145
+ literate_objectifying_stance 0.593 0.400 0.478 200
146
+ literate_probability 0.740 0.544 0.627 136
147
+ literate_qualified_assertion 0.153 0.073 0.099 123
148
+ literate_relative_chain 0.333 0.179 0.233 1717
149
+ literate_technical_abbreviation 0.613 0.725 0.665 153
150
+ literate_technical_term 0.490 0.311 0.381 897
151
+ literate_temporal_embedding 0.210 0.143 0.170 553
152
+ oral_anaphora 0.205 0.128 0.157 141
153
+ oral_antithesis 0.389 0.181 0.247 453
154
+ oral_discourse_formula 0.557 0.173 0.263 568
155
+ oral_embodied_action 0.421 0.213 0.283 489
156
+ oral_everyday_example 0.219 0.209 0.214 358
157
+ oral_imperative 0.537 0.695 0.606 200
158
+ oral_inclusive_we 0.616 0.599 0.608 751
159
+ oral_intensifier_doubling 0.632 0.152 0.245 79
160
+ oral_lexical_repetition 0.406 0.468 0.435 218
161
+ oral_named_individual 0.535 0.566 0.550 813
162
+ oral_phatic_check 0.591 0.684 0.634 76
163
+ oral_phatic_filler 0.469 0.524 0.495 189
164
+ oral_rhetorical_question 0.677 0.646 0.661 1273
165
+ oral_second_person 0.618 0.493 0.549 842
166
+ oral_self_correction 0.582 0.205 0.303 156
167
+ oral_sensory_detail 0.281 0.247 0.263 352
168
+ oral_simple_conjunction 0.146 0.085 0.107 71
169
+ oral_specific_place 0.534 0.582 0.557 373
170
+ oral_temporal_anchor 0.518 0.510 0.514 563
171
+ oral_tricolon 0.247 0.185 0.212 562
172
+ oral_vocative 0.667 0.684 0.675 158
 
173
  ========================================================================
174
+ Macro avg (types w/ support) 0.394
175
  ```
176
 
177
  </details>
178
 
179
+ **Missing labels (test set):** 0/52 — all types detected at least once.
180
 
181
  Notable patterns:
182
+ - **Strong performers** (F1 > 0.5): vocative (0.675), technical_abbreviation (0.665), rhetorical_question (0.661), phatic_check (0.634), conditional (0.629), probability (0.627), agentless_passive (0.624), list_structure (0.619), inclusive_we (0.608), imperative (0.606), citation (0.578), specific_place (0.557), named_individual (0.550), second_person (0.549), enumeration (0.541), agent_demoted (0.530), temporal_anchor (0.514)
183
+ - **Weak performers** (F1 < 0.2): abstract_noun (0.064), qualified_assertion (0.099), simple_conjunction (0.107), anaphora (0.157), concrete_setting (0.166), nested_clauses (0.169), temporal_embedding (0.170), categorical_statement (0.182)
184
+ - **Precision-recall tradeoff**: Most types now show higher precision than recall, indicating the model under-predicts rather than over-predicts markers (reversed from the previous release)
185
+ - **Dropped type**: `oral_parallelism` was excluded from this training run (fell below the 150-span minimum threshold)
186
 
187
  ## Architecture
188
 
 
214
 
215
  ## Limitations
216
 
217
+ - **Low-precision types**: Several types show precision below 0.25, meaning most predictions for those types are false positives
218
+ - **Low-recall types**: `abstract_noun` (0.036 recall), `simple_conjunction` (0.085), and `qualified_assertion` (0.073) are near-invisible to the model despite nonzero precision
219
+ - **Excluded type**: `oral_parallelism` fell below the 150-span minimum and was excluded; structural parallelism remains undetected
220
  - **Context window**: 128 tokens max; longer spans may be truncated
221
  - **Domain**: Trained primarily on historical/literary texts; may underperform on modern social media
222
  - **Subjectivity**: Some marker boundaries are inherently ambiguous
 
238
 
239
  ---
240
 
241
+ *Trained: February 2026*
config.json CHANGED
@@ -1,9 +1,12 @@
1
  {
2
  "add_cross_attention": false,
3
  "architectures": [
4
- "BertModel"
5
  ],
6
  "attention_probs_dropout_prob": 0.1,
 
 
 
7
  "bos_token_id": null,
8
  "classifier_dropout": null,
9
  "dtype": "float32",
@@ -12,43 +15,76 @@
12
  "hidden_act": "gelu",
13
  "hidden_dropout_prob": 0.1,
14
  "hidden_size": 768,
15
- "initializer_range": 0.02,
16
- "intermediate_size": 3072,
17
- "is_decoder": false,
18
- "layer_norm_eps": 1e-12,
19
- "max_position_embeddings": 512,
20
- "model_type": "bert",
21
- "num_attention_heads": 12,
22
- "num_hidden_layers": 12,
23
- "pad_token_id": 0,
24
- "position_embedding_type": "absolute",
25
- "tie_word_embeddings": true,
26
- "transformers_version": "5.0.0",
27
- "type_vocab_size": 2,
28
- "use_cache": true,
29
- "vocab_size": 30522,
30
- "num_labels": 159,
31
  "id2label": {
32
  "0": "O-literate_abstract_noun",
33
  "1": "B-literate_abstract_noun",
34
- "2": "I-literate_abstract_noun",
35
- "3": "O-literate_additive_formal",
36
- "4": "B-literate_additive_formal",
37
- "5": "I-literate_additive_formal",
38
- "6": "O-literate_agent_demoted",
39
- "7": "B-literate_agent_demoted",
40
- "8": "I-literate_agent_demoted",
41
- "9": "O-literate_agentless_passive",
42
  "10": "B-literate_agentless_passive",
 
 
 
 
 
 
 
 
 
 
43
  "11": "I-literate_agentless_passive",
 
 
 
 
 
 
 
 
 
 
44
  "12": "O-literate_aside",
 
 
 
 
 
 
 
 
 
 
45
  "13": "B-literate_aside",
 
 
 
 
 
 
 
 
 
 
46
  "14": "I-literate_aside",
 
 
 
 
 
 
 
 
 
 
47
  "15": "O-literate_categorical_statement",
 
 
 
 
 
 
48
  "16": "B-literate_categorical_statement",
49
  "17": "I-literate_categorical_statement",
50
  "18": "O-literate_causal_explicit",
51
  "19": "B-literate_causal_explicit",
 
52
  "20": "I-literate_causal_explicit",
53
  "21": "O-literate_citation",
54
  "22": "B-literate_citation",
@@ -59,6 +95,7 @@
59
  "27": "O-literate_concessive",
60
  "28": "B-literate_concessive",
61
  "29": "I-literate_concessive",
 
62
  "30": "O-literate_concessive_connector",
63
  "31": "B-literate_concessive_connector",
64
  "32": "I-literate_concessive_connector",
@@ -69,6 +106,7 @@
69
  "37": "B-literate_conditional",
70
  "38": "I-literate_conditional",
71
  "39": "O-literate_contrastive",
 
72
  "40": "B-literate_contrastive",
73
  "41": "I-literate_contrastive",
74
  "42": "O-literate_cross_reference",
@@ -79,6 +117,7 @@
79
  "47": "I-literate_definitional_move",
80
  "48": "O-literate_enumeration",
81
  "49": "B-literate_enumeration",
 
82
  "50": "I-literate_enumeration",
83
  "51": "O-literate_epistemic_hedge",
84
  "52": "B-literate_epistemic_hedge",
@@ -89,6 +128,7 @@
89
  "57": "O-literate_institutional_subject",
90
  "58": "B-literate_institutional_subject",
91
  "59": "I-literate_institutional_subject",
 
92
  "60": "O-literate_list_structure",
93
  "61": "B-literate_list_structure",
94
  "62": "I-literate_list_structure",
@@ -99,6 +139,7 @@
99
  "67": "B-literate_nested_clauses",
100
  "68": "I-literate_nested_clauses",
101
  "69": "O-literate_nominalization",
 
102
  "70": "B-literate_nominalization",
103
  "71": "I-literate_nominalization",
104
  "72": "O-literate_objectifying_stance",
@@ -109,6 +150,7 @@
109
  "77": "I-literate_probability",
110
  "78": "O-literate_qualified_assertion",
111
  "79": "B-literate_qualified_assertion",
 
112
  "80": "I-literate_qualified_assertion",
113
  "81": "O-literate_relative_chain",
114
  "82": "B-literate_relative_chain",
@@ -119,6 +161,7 @@
119
  "87": "O-literate_technical_term",
120
  "88": "B-literate_technical_term",
121
  "89": "I-literate_technical_term",
 
122
  "90": "O-literate_temporal_embedding",
123
  "91": "B-literate_temporal_embedding",
124
  "92": "I-literate_temporal_embedding",
@@ -128,230 +171,180 @@
128
  "96": "O-oral_antithesis",
129
  "97": "B-oral_antithesis",
130
  "98": "I-oral_antithesis",
131
- "99": "O-oral_discourse_formula",
132
- "100": "B-oral_discourse_formula",
133
- "101": "I-oral_discourse_formula",
134
- "102": "O-oral_embodied_action",
135
- "103": "B-oral_embodied_action",
136
- "104": "I-oral_embodied_action",
137
- "105": "O-oral_everyday_example",
138
- "106": "B-oral_everyday_example",
139
- "107": "I-oral_everyday_example",
140
- "108": "O-oral_imperative",
141
- "109": "B-oral_imperative",
142
- "110": "I-oral_imperative",
143
- "111": "O-oral_inclusive_we",
144
- "112": "B-oral_inclusive_we",
145
- "113": "I-oral_inclusive_we",
146
- "114": "O-oral_intensifier_doubling",
147
- "115": "B-oral_intensifier_doubling",
148
- "116": "I-oral_intensifier_doubling",
149
- "117": "O-oral_lexical_repetition",
150
- "118": "B-oral_lexical_repetition",
151
- "119": "I-oral_lexical_repetition",
152
- "120": "O-oral_named_individual",
153
- "121": "B-oral_named_individual",
154
- "122": "I-oral_named_individual",
155
- "123": "O-oral_parallelism",
156
- "124": "B-oral_parallelism",
157
- "125": "I-oral_parallelism",
158
- "126": "O-oral_phatic_check",
159
- "127": "B-oral_phatic_check",
160
- "128": "I-oral_phatic_check",
161
- "129": "O-oral_phatic_filler",
162
- "130": "B-oral_phatic_filler",
163
- "131": "I-oral_phatic_filler",
164
- "132": "O-oral_rhetorical_question",
165
- "133": "B-oral_rhetorical_question",
166
- "134": "I-oral_rhetorical_question",
167
- "135": "O-oral_second_person",
168
- "136": "B-oral_second_person",
169
- "137": "I-oral_second_person",
170
- "138": "O-oral_self_correction",
171
- "139": "B-oral_self_correction",
172
- "140": "I-oral_self_correction",
173
- "141": "O-oral_sensory_detail",
174
- "142": "B-oral_sensory_detail",
175
- "143": "I-oral_sensory_detail",
176
- "144": "O-oral_simple_conjunction",
177
- "145": "B-oral_simple_conjunction",
178
- "146": "I-oral_simple_conjunction",
179
- "147": "O-oral_specific_place",
180
- "148": "B-oral_specific_place",
181
- "149": "I-oral_specific_place",
182
- "150": "O-oral_temporal_anchor",
183
- "151": "B-oral_temporal_anchor",
184
- "152": "I-oral_temporal_anchor",
185
- "153": "O-oral_tricolon",
186
- "154": "B-oral_tricolon",
187
- "155": "I-oral_tricolon",
188
- "156": "O-oral_vocative",
189
- "157": "B-oral_vocative",
190
- "158": "I-oral_vocative"
191
  },
 
 
 
192
  "label2id": {
193
- "O-literate_abstract_noun": 0,
194
  "B-literate_abstract_noun": 1,
195
- "I-literate_abstract_noun": 2,
196
- "O-literate_additive_formal": 3,
197
  "B-literate_additive_formal": 4,
198
- "I-literate_additive_formal": 5,
199
- "O-literate_agent_demoted": 6,
200
  "B-literate_agent_demoted": 7,
201
- "I-literate_agent_demoted": 8,
202
- "O-literate_agentless_passive": 9,
203
  "B-literate_agentless_passive": 10,
204
- "I-literate_agentless_passive": 11,
205
- "O-literate_aside": 12,
206
  "B-literate_aside": 13,
207
- "I-literate_aside": 14,
208
- "O-literate_categorical_statement": 15,
209
  "B-literate_categorical_statement": 16,
210
- "I-literate_categorical_statement": 17,
211
- "O-literate_causal_explicit": 18,
212
  "B-literate_causal_explicit": 19,
213
- "I-literate_causal_explicit": 20,
214
- "O-literate_citation": 21,
215
  "B-literate_citation": 22,
216
- "I-literate_citation": 23,
217
- "O-literate_conceptual_metaphor": 24,
218
  "B-literate_conceptual_metaphor": 25,
219
- "I-literate_conceptual_metaphor": 26,
220
- "O-literate_concessive": 27,
221
  "B-literate_concessive": 28,
222
- "I-literate_concessive": 29,
223
- "O-literate_concessive_connector": 30,
224
  "B-literate_concessive_connector": 31,
225
- "I-literate_concessive_connector": 32,
226
- "O-literate_concrete_setting": 33,
227
  "B-literate_concrete_setting": 34,
228
- "I-literate_concrete_setting": 35,
229
- "O-literate_conditional": 36,
230
  "B-literate_conditional": 37,
231
- "I-literate_conditional": 38,
232
- "O-literate_contrastive": 39,
233
  "B-literate_contrastive": 40,
234
- "I-literate_contrastive": 41,
235
- "O-literate_cross_reference": 42,
236
  "B-literate_cross_reference": 43,
237
- "I-literate_cross_reference": 44,
238
- "O-literate_definitional_move": 45,
239
  "B-literate_definitional_move": 46,
240
- "I-literate_definitional_move": 47,
241
- "O-literate_enumeration": 48,
242
  "B-literate_enumeration": 49,
243
- "I-literate_enumeration": 50,
244
- "O-literate_epistemic_hedge": 51,
245
  "B-literate_epistemic_hedge": 52,
246
- "I-literate_epistemic_hedge": 53,
247
- "O-literate_evidential": 54,
248
  "B-literate_evidential": 55,
249
- "I-literate_evidential": 56,
250
- "O-literate_institutional_subject": 57,
251
  "B-literate_institutional_subject": 58,
252
- "I-literate_institutional_subject": 59,
253
- "O-literate_list_structure": 60,
254
  "B-literate_list_structure": 61,
255
- "I-literate_list_structure": 62,
256
- "O-literate_metadiscourse": 63,
257
  "B-literate_metadiscourse": 64,
258
- "I-literate_metadiscourse": 65,
259
- "O-literate_nested_clauses": 66,
260
  "B-literate_nested_clauses": 67,
261
- "I-literate_nested_clauses": 68,
262
- "O-literate_nominalization": 69,
263
  "B-literate_nominalization": 70,
264
- "I-literate_nominalization": 71,
265
- "O-literate_objectifying_stance": 72,
266
  "B-literate_objectifying_stance": 73,
267
- "I-literate_objectifying_stance": 74,
268
- "O-literate_probability": 75,
269
  "B-literate_probability": 76,
270
- "I-literate_probability": 77,
271
- "O-literate_qualified_assertion": 78,
272
  "B-literate_qualified_assertion": 79,
273
- "I-literate_qualified_assertion": 80,
274
- "O-literate_relative_chain": 81,
275
  "B-literate_relative_chain": 82,
276
- "I-literate_relative_chain": 83,
277
- "O-literate_technical_abbreviation": 84,
278
  "B-literate_technical_abbreviation": 85,
279
- "I-literate_technical_abbreviation": 86,
280
- "O-literate_technical_term": 87,
281
  "B-literate_technical_term": 88,
282
- "I-literate_technical_term": 89,
283
- "O-literate_temporal_embedding": 90,
284
  "B-literate_temporal_embedding": 91,
285
- "I-literate_temporal_embedding": 92,
286
- "O-oral_anaphora": 93,
287
  "B-oral_anaphora": 94,
288
- "I-oral_anaphora": 95,
289
- "O-oral_antithesis": 96,
290
  "B-oral_antithesis": 97,
291
- "I-oral_antithesis": 98,
292
- "O-oral_discourse_formula": 99,
293
  "B-oral_discourse_formula": 100,
294
- "I-oral_discourse_formula": 101,
295
- "O-oral_embodied_action": 102,
296
  "B-oral_embodied_action": 103,
297
- "I-oral_embodied_action": 104,
298
- "O-oral_everyday_example": 105,
299
  "B-oral_everyday_example": 106,
300
- "I-oral_everyday_example": 107,
301
- "O-oral_imperative": 108,
302
  "B-oral_imperative": 109,
303
- "I-oral_imperative": 110,
304
- "O-oral_inclusive_we": 111,
305
  "B-oral_inclusive_we": 112,
306
- "I-oral_inclusive_we": 113,
307
- "O-oral_intensifier_doubling": 114,
308
  "B-oral_intensifier_doubling": 115,
309
- "I-oral_intensifier_doubling": 116,
310
- "O-oral_lexical_repetition": 117,
311
  "B-oral_lexical_repetition": 118,
312
- "I-oral_lexical_repetition": 119,
313
- "O-oral_named_individual": 120,
314
  "B-oral_named_individual": 121,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
315
  "I-oral_named_individual": 122,
316
- "O-oral_parallelism": 123,
317
- "B-oral_parallelism": 124,
318
- "I-oral_parallelism": 125,
319
- "O-oral_phatic_check": 126,
320
- "B-oral_phatic_check": 127,
321
- "I-oral_phatic_check": 128,
322
- "O-oral_phatic_filler": 129,
323
- "B-oral_phatic_filler": 130,
324
- "I-oral_phatic_filler": 131,
325
- "O-oral_rhetorical_question": 132,
326
- "B-oral_rhetorical_question": 133,
327
- "I-oral_rhetorical_question": 134,
328
- "O-oral_second_person": 135,
329
- "B-oral_second_person": 136,
330
- "I-oral_second_person": 137,
331
- "O-oral_self_correction": 138,
332
- "B-oral_self_correction": 139,
333
- "I-oral_self_correction": 140,
334
- "O-oral_sensory_detail": 141,
335
- "B-oral_sensory_detail": 142,
336
- "I-oral_sensory_detail": 143,
337
- "O-oral_simple_conjunction": 144,
338
- "B-oral_simple_conjunction": 145,
339
- "I-oral_simple_conjunction": 146,
340
- "O-oral_specific_place": 147,
341
- "B-oral_specific_place": 148,
342
- "I-oral_specific_place": 149,
343
- "O-oral_temporal_anchor": 150,
344
- "B-oral_temporal_anchor": 151,
345
- "I-oral_temporal_anchor": 152,
346
- "O-oral_tricolon": 153,
347
- "B-oral_tricolon": 154,
348
- "I-oral_tricolon": 155,
349
- "O-oral_vocative": 156,
350
- "B-oral_vocative": 157,
351
- "I-oral_vocative": 158
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
352
  },
353
- "num_types": 53,
354
- "auto_map": {
355
- "AutoModel": "modeling_havelock.HavelockTokenClassifier"
356
- }
357
- }
 
 
 
 
 
 
 
 
 
 
1
  {
2
  "add_cross_attention": false,
3
  "architectures": [
4
+ "BertForMaskedLM"
5
  ],
6
  "attention_probs_dropout_prob": 0.1,
7
+ "auto_map": {
8
+ "AutoModel": "modeling_havelock.HavelockTokenClassifier"
9
+ },
10
  "bos_token_id": null,
11
  "classifier_dropout": null,
12
  "dtype": "float32",
 
15
  "hidden_act": "gelu",
16
  "hidden_dropout_prob": 0.1,
17
  "hidden_size": 768,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  "id2label": {
19
  "0": "O-literate_abstract_noun",
20
  "1": "B-literate_abstract_noun",
 
 
 
 
 
 
 
 
21
  "10": "B-literate_agentless_passive",
22
+ "100": "B-oral_discourse_formula",
23
+ "101": "I-oral_discourse_formula",
24
+ "102": "O-oral_embodied_action",
25
+ "103": "B-oral_embodied_action",
26
+ "104": "I-oral_embodied_action",
27
+ "105": "O-oral_everyday_example",
28
+ "106": "B-oral_everyday_example",
29
+ "107": "I-oral_everyday_example",
30
+ "108": "O-oral_imperative",
31
+ "109": "B-oral_imperative",
32
  "11": "I-literate_agentless_passive",
33
+ "110": "I-oral_imperative",
34
+ "111": "O-oral_inclusive_we",
35
+ "112": "B-oral_inclusive_we",
36
+ "113": "I-oral_inclusive_we",
37
+ "114": "O-oral_intensifier_doubling",
38
+ "115": "B-oral_intensifier_doubling",
39
+ "116": "I-oral_intensifier_doubling",
40
+ "117": "O-oral_lexical_repetition",
41
+ "118": "B-oral_lexical_repetition",
42
+ "119": "I-oral_lexical_repetition",
43
  "12": "O-literate_aside",
44
+ "120": "O-oral_named_individual",
45
+ "121": "B-oral_named_individual",
46
+ "122": "I-oral_named_individual",
47
+ "123": "O-oral_phatic_check",
48
+ "124": "B-oral_phatic_check",
49
+ "125": "I-oral_phatic_check",
50
+ "126": "O-oral_phatic_filler",
51
+ "127": "B-oral_phatic_filler",
52
+ "128": "I-oral_phatic_filler",
53
+ "129": "O-oral_rhetorical_question",
54
  "13": "B-literate_aside",
55
+ "130": "B-oral_rhetorical_question",
56
+ "131": "I-oral_rhetorical_question",
57
+ "132": "O-oral_second_person",
58
+ "133": "B-oral_second_person",
59
+ "134": "I-oral_second_person",
60
+ "135": "O-oral_self_correction",
61
+ "136": "B-oral_self_correction",
62
+ "137": "I-oral_self_correction",
63
+ "138": "O-oral_sensory_detail",
64
+ "139": "B-oral_sensory_detail",
65
  "14": "I-literate_aside",
66
+ "140": "I-oral_sensory_detail",
67
+ "141": "O-oral_simple_conjunction",
68
+ "142": "B-oral_simple_conjunction",
69
+ "143": "I-oral_simple_conjunction",
70
+ "144": "O-oral_specific_place",
71
+ "145": "B-oral_specific_place",
72
+ "146": "I-oral_specific_place",
73
+ "147": "O-oral_temporal_anchor",
74
+ "148": "B-oral_temporal_anchor",
75
+ "149": "I-oral_temporal_anchor",
76
  "15": "O-literate_categorical_statement",
77
+ "150": "O-oral_tricolon",
78
+ "151": "B-oral_tricolon",
79
+ "152": "I-oral_tricolon",
80
+ "153": "O-oral_vocative",
81
+ "154": "B-oral_vocative",
82
+ "155": "I-oral_vocative",
83
  "16": "B-literate_categorical_statement",
84
  "17": "I-literate_categorical_statement",
85
  "18": "O-literate_causal_explicit",
86
  "19": "B-literate_causal_explicit",
87
+ "2": "I-literate_abstract_noun",
88
  "20": "I-literate_causal_explicit",
89
  "21": "O-literate_citation",
90
  "22": "B-literate_citation",
 
95
  "27": "O-literate_concessive",
96
  "28": "B-literate_concessive",
97
  "29": "I-literate_concessive",
98
+ "3": "O-literate_additive_formal",
99
  "30": "O-literate_concessive_connector",
100
  "31": "B-literate_concessive_connector",
101
  "32": "I-literate_concessive_connector",
 
106
  "37": "B-literate_conditional",
107
  "38": "I-literate_conditional",
108
  "39": "O-literate_contrastive",
109
+ "4": "B-literate_additive_formal",
110
  "40": "B-literate_contrastive",
111
  "41": "I-literate_contrastive",
112
  "42": "O-literate_cross_reference",
 
117
  "47": "I-literate_definitional_move",
118
  "48": "O-literate_enumeration",
119
  "49": "B-literate_enumeration",
120
+ "5": "I-literate_additive_formal",
121
  "50": "I-literate_enumeration",
122
  "51": "O-literate_epistemic_hedge",
123
  "52": "B-literate_epistemic_hedge",
 
128
  "57": "O-literate_institutional_subject",
129
  "58": "B-literate_institutional_subject",
130
  "59": "I-literate_institutional_subject",
131
+ "6": "O-literate_agent_demoted",
132
  "60": "O-literate_list_structure",
133
  "61": "B-literate_list_structure",
134
  "62": "I-literate_list_structure",
 
139
  "67": "B-literate_nested_clauses",
140
  "68": "I-literate_nested_clauses",
141
  "69": "O-literate_nominalization",
142
+ "7": "B-literate_agent_demoted",
143
  "70": "B-literate_nominalization",
144
  "71": "I-literate_nominalization",
145
  "72": "O-literate_objectifying_stance",
 
150
  "77": "I-literate_probability",
151
  "78": "O-literate_qualified_assertion",
152
  "79": "B-literate_qualified_assertion",
153
+ "8": "I-literate_agent_demoted",
154
  "80": "I-literate_qualified_assertion",
155
  "81": "O-literate_relative_chain",
156
  "82": "B-literate_relative_chain",
 
161
  "87": "O-literate_technical_term",
162
  "88": "B-literate_technical_term",
163
  "89": "I-literate_technical_term",
164
+ "9": "O-literate_agentless_passive",
165
  "90": "O-literate_temporal_embedding",
166
  "91": "B-literate_temporal_embedding",
167
  "92": "I-literate_temporal_embedding",
 
171
  "96": "O-oral_antithesis",
172
  "97": "B-oral_antithesis",
173
  "98": "I-oral_antithesis",
174
+ "99": "O-oral_discourse_formula"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
175
  },
176
+ "initializer_range": 0.02,
177
+ "intermediate_size": 3072,
178
+ "is_decoder": false,
179
  "label2id": {
 
180
  "B-literate_abstract_noun": 1,
 
 
181
  "B-literate_additive_formal": 4,
 
 
182
  "B-literate_agent_demoted": 7,
 
 
183
  "B-literate_agentless_passive": 10,
 
 
184
  "B-literate_aside": 13,
 
 
185
  "B-literate_categorical_statement": 16,
 
 
186
  "B-literate_causal_explicit": 19,
 
 
187
  "B-literate_citation": 22,
 
 
188
  "B-literate_conceptual_metaphor": 25,
 
 
189
  "B-literate_concessive": 28,
 
 
190
  "B-literate_concessive_connector": 31,
 
 
191
  "B-literate_concrete_setting": 34,
 
 
192
  "B-literate_conditional": 37,
 
 
193
  "B-literate_contrastive": 40,
 
 
194
  "B-literate_cross_reference": 43,
 
 
195
  "B-literate_definitional_move": 46,
 
 
196
  "B-literate_enumeration": 49,
 
 
197
  "B-literate_epistemic_hedge": 52,
 
 
198
  "B-literate_evidential": 55,
 
 
199
  "B-literate_institutional_subject": 58,
 
 
200
  "B-literate_list_structure": 61,
 
 
201
  "B-literate_metadiscourse": 64,
 
 
202
  "B-literate_nested_clauses": 67,
 
 
203
  "B-literate_nominalization": 70,
 
 
204
  "B-literate_objectifying_stance": 73,
 
 
205
  "B-literate_probability": 76,
 
 
206
  "B-literate_qualified_assertion": 79,
 
 
207
  "B-literate_relative_chain": 82,
 
 
208
  "B-literate_technical_abbreviation": 85,
 
 
209
  "B-literate_technical_term": 88,
 
 
210
  "B-literate_temporal_embedding": 91,
 
 
211
  "B-oral_anaphora": 94,
 
 
212
  "B-oral_antithesis": 97,
 
 
213
  "B-oral_discourse_formula": 100,
 
 
214
  "B-oral_embodied_action": 103,
 
 
215
  "B-oral_everyday_example": 106,
 
 
216
  "B-oral_imperative": 109,
 
 
217
  "B-oral_inclusive_we": 112,
 
 
218
  "B-oral_intensifier_doubling": 115,
 
 
219
  "B-oral_lexical_repetition": 118,
 
 
220
  "B-oral_named_individual": 121,
221
+ "B-oral_phatic_check": 124,
222
+ "B-oral_phatic_filler": 127,
223
+ "B-oral_rhetorical_question": 130,
224
+ "B-oral_second_person": 133,
225
+ "B-oral_self_correction": 136,
226
+ "B-oral_sensory_detail": 139,
227
+ "B-oral_simple_conjunction": 142,
228
+ "B-oral_specific_place": 145,
229
+ "B-oral_temporal_anchor": 148,
230
+ "B-oral_tricolon": 151,
231
+ "B-oral_vocative": 154,
232
+ "I-literate_abstract_noun": 2,
233
+ "I-literate_additive_formal": 5,
234
+ "I-literate_agent_demoted": 8,
235
+ "I-literate_agentless_passive": 11,
236
+ "I-literate_aside": 14,
237
+ "I-literate_categorical_statement": 17,
238
+ "I-literate_causal_explicit": 20,
239
+ "I-literate_citation": 23,
240
+ "I-literate_conceptual_metaphor": 26,
241
+ "I-literate_concessive": 29,
242
+ "I-literate_concessive_connector": 32,
243
+ "I-literate_concrete_setting": 35,
244
+ "I-literate_conditional": 38,
245
+ "I-literate_contrastive": 41,
246
+ "I-literate_cross_reference": 44,
247
+ "I-literate_definitional_move": 47,
248
+ "I-literate_enumeration": 50,
249
+ "I-literate_epistemic_hedge": 53,
250
+ "I-literate_evidential": 56,
251
+ "I-literate_institutional_subject": 59,
252
+ "I-literate_list_structure": 62,
253
+ "I-literate_metadiscourse": 65,
254
+ "I-literate_nested_clauses": 68,
255
+ "I-literate_nominalization": 71,
256
+ "I-literate_objectifying_stance": 74,
257
+ "I-literate_probability": 77,
258
+ "I-literate_qualified_assertion": 80,
259
+ "I-literate_relative_chain": 83,
260
+ "I-literate_technical_abbreviation": 86,
261
+ "I-literate_technical_term": 89,
262
+ "I-literate_temporal_embedding": 92,
263
+ "I-oral_anaphora": 95,
264
+ "I-oral_antithesis": 98,
265
+ "I-oral_discourse_formula": 101,
266
+ "I-oral_embodied_action": 104,
267
+ "I-oral_everyday_example": 107,
268
+ "I-oral_imperative": 110,
269
+ "I-oral_inclusive_we": 113,
270
+ "I-oral_intensifier_doubling": 116,
271
+ "I-oral_lexical_repetition": 119,
272
  "I-oral_named_individual": 122,
273
+ "I-oral_phatic_check": 125,
274
+ "I-oral_phatic_filler": 128,
275
+ "I-oral_rhetorical_question": 131,
276
+ "I-oral_second_person": 134,
277
+ "I-oral_self_correction": 137,
278
+ "I-oral_sensory_detail": 140,
279
+ "I-oral_simple_conjunction": 143,
280
+ "I-oral_specific_place": 146,
281
+ "I-oral_temporal_anchor": 149,
282
+ "I-oral_tricolon": 152,
283
+ "I-oral_vocative": 155,
284
+ "O-literate_abstract_noun": 0,
285
+ "O-literate_additive_formal": 3,
286
+ "O-literate_agent_demoted": 6,
287
+ "O-literate_agentless_passive": 9,
288
+ "O-literate_aside": 12,
289
+ "O-literate_categorical_statement": 15,
290
+ "O-literate_causal_explicit": 18,
291
+ "O-literate_citation": 21,
292
+ "O-literate_conceptual_metaphor": 24,
293
+ "O-literate_concessive": 27,
294
+ "O-literate_concessive_connector": 30,
295
+ "O-literate_concrete_setting": 33,
296
+ "O-literate_conditional": 36,
297
+ "O-literate_contrastive": 39,
298
+ "O-literate_cross_reference": 42,
299
+ "O-literate_definitional_move": 45,
300
+ "O-literate_enumeration": 48,
301
+ "O-literate_epistemic_hedge": 51,
302
+ "O-literate_evidential": 54,
303
+ "O-literate_institutional_subject": 57,
304
+ "O-literate_list_structure": 60,
305
+ "O-literate_metadiscourse": 63,
306
+ "O-literate_nested_clauses": 66,
307
+ "O-literate_nominalization": 69,
308
+ "O-literate_objectifying_stance": 72,
309
+ "O-literate_probability": 75,
310
+ "O-literate_qualified_assertion": 78,
311
+ "O-literate_relative_chain": 81,
312
+ "O-literate_technical_abbreviation": 84,
313
+ "O-literate_technical_term": 87,
314
+ "O-literate_temporal_embedding": 90,
315
+ "O-oral_anaphora": 93,
316
+ "O-oral_antithesis": 96,
317
+ "O-oral_discourse_formula": 99,
318
+ "O-oral_embodied_action": 102,
319
+ "O-oral_everyday_example": 105,
320
+ "O-oral_imperative": 108,
321
+ "O-oral_inclusive_we": 111,
322
+ "O-oral_intensifier_doubling": 114,
323
+ "O-oral_lexical_repetition": 117,
324
+ "O-oral_named_individual": 120,
325
+ "O-oral_phatic_check": 123,
326
+ "O-oral_phatic_filler": 126,
327
+ "O-oral_rhetorical_question": 129,
328
+ "O-oral_second_person": 132,
329
+ "O-oral_self_correction": 135,
330
+ "O-oral_sensory_detail": 138,
331
+ "O-oral_simple_conjunction": 141,
332
+ "O-oral_specific_place": 144,
333
+ "O-oral_temporal_anchor": 147,
334
+ "O-oral_tricolon": 150,
335
+ "O-oral_vocative": 153
336
  },
337
+ "layer_norm_eps": 1e-12,
338
+ "max_position_embeddings": 512,
339
+ "model_type": "bert",
340
+ "num_attention_heads": 12,
341
+ "num_hidden_layers": 12,
342
+ "num_types": 52,
343
+ "pad_token_id": 0,
344
+ "position_embedding_type": "absolute",
345
+ "tie_word_embeddings": true,
346
+ "transformers_version": "5.0.0",
347
+ "type_vocab_size": 2,
348
+ "use_cache": true,
349
+ "vocab_size": 30522
350
+ }
head_config.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
  "model_name": "bert-base-uncased",
3
- "num_types": 53,
4
  "hidden_size": 768
5
  }
 
1
  {
2
  "model_name": "bert-base-uncased",
3
+ "num_types": 52,
4
  "hidden_size": 768
5
  }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:934733db298a26d41556120f86ad64efcb49728be77efdcb14b66664a38a28af
3
- size 436078996
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e7f65c36ddbc7fa2756a9e31ff2735c9708f2d891e4d637519a90675c6aa7088
3
+ size 436073152
modeling_havelock.py CHANGED
@@ -1,6 +1,5 @@
1
  """Custom multi-label token classifier for HuggingFace Hub."""
2
 
3
- import torch
4
  import torch.nn as nn
5
  from transformers import BertModel, BertPreTrainedModel
6
 
 
1
  """Custom multi-label token classifier for HuggingFace Hub."""
2
 
 
3
  import torch.nn as nn
4
  from transformers import BertModel, BertPreTrainedModel
5
 
type_to_idx.json CHANGED
@@ -40,16 +40,15 @@
40
  "oral_intensifier_doubling": 38,
41
  "oral_lexical_repetition": 39,
42
  "oral_named_individual": 40,
43
- "oral_parallelism": 41,
44
- "oral_phatic_check": 42,
45
- "oral_phatic_filler": 43,
46
- "oral_rhetorical_question": 44,
47
- "oral_second_person": 45,
48
- "oral_self_correction": 46,
49
- "oral_sensory_detail": 47,
50
- "oral_simple_conjunction": 48,
51
- "oral_specific_place": 49,
52
- "oral_temporal_anchor": 50,
53
- "oral_tricolon": 51,
54
- "oral_vocative": 52
55
  }
 
40
  "oral_intensifier_doubling": 38,
41
  "oral_lexical_repetition": 39,
42
  "oral_named_individual": 40,
43
+ "oral_phatic_check": 41,
44
+ "oral_phatic_filler": 42,
45
+ "oral_rhetorical_question": 43,
46
+ "oral_second_person": 44,
47
+ "oral_self_correction": 45,
48
+ "oral_sensory_detail": 46,
49
+ "oral_simple_conjunction": 47,
50
+ "oral_specific_place": 48,
51
+ "oral_temporal_anchor": 49,
52
+ "oral_tricolon": 50,
53
+ "oral_vocative": 51
 
54
  }