Upload folder using huggingface_hub

Browse files

Files changed (5) hide show

README.md +126 -319
config.json +1 -295
head_config.json +5 -0
model.safetensors +2 -2
type_to_idx.json +55 -0

README.md CHANGED Viewed

@@ -5,7 +5,7 @@ tags:
 - bert
 - orality
 - linguistics
-- ner
 language:
 - en
 metrics:
@@ -22,389 +22,196 @@ datasets:
 BERT-based token classifier for detecting **oral and literate markers** in text, based on Walter Ong's "Orality and Literacy" (1982).
-This model performs span-level detection of 72 rhetorical marker types using BIO tagging (145 labels total).
 ## Model Details
 | Property | Value |
 |----------|-------|
 | Base model | `bert-base-uncased` |
-| Task | Token classification (BIO tagging) |
-| Labels | 145 (72 marker types × B/I + O) |
-| Best F1 | **0.5003** (macro, markers only) |
-| Training | 20 epochs, batch 8, lr 2e-5 |
-| Loss | Focal loss (γ=1.0) for class imbalance |
 ## Usage
 ```python
-from transformers import AutoTokenizer, AutoModelForTokenClassification
 import torch
-model_name = "HavelockAI/bert-token-classifier"
-tokenizer = AutoTokenizer.from_pretrained(model_name)
-model = AutoModelForTokenClassification.from_pretrained(model_name)
 text = "Tell me, O Muse, of that ingenious hero who travelled far and wide"
-inputs = tokenizer(text, return_tensors="pt", return_offsets_mapping=True)
-offset_mapping = inputs.pop("offset_mapping")
 with torch.no_grad():
-    outputs = model(**inputs)
-    predictions = torch.argmax(outputs.logits, dim=-1)
-# Decode predictions
 tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
-labels = [model.config.id2label[p.item()] for p in predictions[0]]
-for token, label in zip(tokens, labels):
-    if label != "O":
-        print(f"{token:15} {label}")
-```
-**Output:**
-```
-tell            B-oral_imperative
-me              I-oral_imperative
-,               I-oral_imperative
-o               B-oral_vocative
-muse            I-oral_vocative
 ```
 ## Training Data
-- **3,119 examples** with BIO-tagged spans
-- **4,474 marker annotations** across 72 types
 - Sources: Project Gutenberg, textfiles.com, Reddit, Wikipedia talk pages
-- Synthetic examples for rare marker types (30 examples minimum per type)
-### Class Distribution
-The dataset exhibits extreme class imbalance (72 marker types, long-tail distribution). We use focal loss to down-weight easy examples and focus learning on rare markers.
-| Frequency | Marker types |
-|-----------|--------------|
-| >100 examples | 15 types (21%) |
-| 30-100 examples | 37 types (51%) |
-| <30 examples | 20 types (28%) |
-## Marker Types (72)
-### Oral Markers (36 types)
 Characteristics of oral tradition and spoken discourse:
 | Category | Markers |
 |----------|---------|
-| **Repetition & Pattern** | anaphora, epistrophe, parallelism, tricolon, lexical_repetition, refrain |
-| **Sound & Rhythm** | alliteration, rhythm, assonance, rhyme |
-| **Address & Interaction** | vocative, imperative, second_person, inclusive_we, rhetorical_question, audience_response, phatic_check, phatic_filler |
-| **Conjunction** | polysyndeton, asyndeton, simple_conjunction, binomial_expression |
-| **Formulas** | discourse_formula, proverb, religious_formula, epithet |
 | **Narrative** | named_individual, specific_place, temporal_anchor, sensory_detail, embodied_action, everyday_example |
-| **Performance** | dramatic_pause, self_correction, conflict_frame, us_them, first_person, paradox |
-### Literate Markers (36 types)
 Characteristics of written, analytical discourse:
 | Category | Markers |
 |----------|---------|
 | **Abstraction** | nominalization, abstract_noun, conceptual_metaphor, categorical_statement |
-| **Syntax** | nested_clauses, relative_chain, conditional, concessive, temporal_embedding, causal_chain |
 | **Hedging** | epistemic_hedge, probability, evidential, qualified_assertion, concessive_connector |
-| **Impersonality** | agentless_passive, agent_demoted, institutional_subject, objectifying_stance, third_person_reference |
-| **Scholarly apparatus** | citation, footnote_reference, cross_reference, metadiscourse, methodological_framing |
-| **Technical** | technical_term, technical_abbreviation, enumeration, list_structure, definitional_move |
-| **Connectives** | contrastive, causal_explicit, additive_formal, paradox |
 ## Evaluation
-Per-class F1 on test set:
 <details><summary>Click to show per-marker precision/recall/F1/support</summary>
 ```
-                                   precision    recall  f1-score   support
-                                O      0.733     0.828     0.778      3556
-         B-literate_abstract_noun      0.333     0.286     0.308        14
-       B-literate_additive_formal      1.000     0.667     0.800         3
-         B-literate_agent_demoted      0.800     1.000     0.889         4
-     B-literate_agentless_passive      0.357     0.417     0.385        24
-                 B-literate_aside      0.429     0.667     0.522         9
- B-literate_categorical_statement      0.500     0.750     0.600         4
-          B-literate_causal_chain      1.000     0.333     0.500         3
-       B-literate_causal_explicit      0.538     0.636     0.583        11
-              B-literate_citation      0.000     0.000     0.000        10
-   B-literate_conceptual_metaphor      0.667     0.333     0.444         6
-            B-literate_concessive      1.000     1.000     1.000         2
-  B-literate_concessive_connector      0.800     0.800     0.800         5
-           B-literate_conditional      0.643     0.643     0.643        14
-           B-literate_contrastive      0.400     0.500     0.444         8
-     B-literate_definitional_move      1.000     1.000     1.000         1
-           B-literate_enumeration      0.500     0.667     0.571         3
-       B-literate_epistemic_hedge      0.387     0.500     0.436        24
-            B-literate_evidential      0.333     0.091     0.143        11
-    B-literate_footnote_reference      0.500     0.667     0.571         3
- B-literate_institutional_subject      0.750     1.000     0.857         3
-        B-literate_list_structure      0.000     0.000     0.000         1
-         B-literate_metadiscourse      0.500     0.500     0.500         4
-B-literate_methodological_framing      1.000     0.500     0.667         4
-        B-literate_nested_clauses      0.293     0.545     0.381        22
-        B-literate_nominalization      0.750     0.300     0.429        10
-   B-literate_objectifying_stance      0.500     0.500     0.500         4
-               B-literate_paradox      0.500     0.333     0.400         3
-           B-literate_probability      0.333     0.200     0.250         5
-   B-literate_qualified_assertion      0.000     0.000     0.000         5
-        B-literate_relative_chain      0.314     0.727     0.438        22
-B-literate_technical_abbreviation      0.000     0.000     0.000         2
-        B-literate_technical_term      0.333     0.667     0.444         3
-    B-literate_temporal_embedding      1.000     0.500     0.667         4
-B-literate_third_person_reference      0.333     0.333     0.333         3
-              B-oral_alliteration      1.000     0.667     0.800         3
-                  B-oral_anaphora      0.130     0.200     0.158        15
-                 B-oral_asyndeton      0.000     0.000     0.000         1
-         B-oral_audience_response      1.000     1.000     1.000         4
-       B-oral_binomial_expression      0.400     0.400     0.400         5
-            B-oral_conflict_frame      0.800     0.800     0.800         5
-         B-oral_discourse_formula      0.500     0.500     0.500         6
-            B-oral_dramatic_pause      0.000     0.000     0.000         2
-           B-oral_embodied_action      0.333     0.167     0.222         6
-                B-oral_epistrophe      0.000     0.000     0.000         3
-                   B-oral_epithet      0.000     0.000     0.000         2
-          B-oral_everyday_example      1.000     1.000     1.000         3
-              B-oral_first_person      0.000     0.000     0.000         5
-                B-oral_imperative      0.600     0.643     0.621        14
-              B-oral_inclusive_we      0.486     0.586     0.531        29
-      B-oral_intensifier_doubling      1.000     0.667     0.800         3
-        B-oral_lexical_repetition      0.273     0.300     0.286        10
-          B-oral_named_individual      0.600     0.450     0.514        20
-               B-oral_parallelism      0.083     0.143     0.105         7
-              B-oral_phatic_check      1.000     1.000     1.000         1
-             B-oral_phatic_filler      0.429     0.600     0.500         5
-              B-oral_polysyndeton      0.250     0.200     0.222        10
-                   B-oral_proverb      1.000     0.500     0.667         6
-                   B-oral_refrain      1.000     1.000     1.000         1
-         B-oral_religious_formula      1.000     0.500     0.667         2
-       B-oral_rhetorical_question      0.250     1.000     0.400         2
-                    B-oral_rhythm      0.714     0.833     0.769         6
-             B-oral_second_person      0.516     0.640     0.571        25
-           B-oral_self_correction      0.750     1.000     0.857         3
-            B-oral_sensory_detail      1.000     1.000     1.000         1
-        B-oral_simple_conjunction      0.000     0.000     0.000         3
-            B-oral_specific_place      0.400     0.667     0.500         3
-           B-oral_temporal_anchor      0.000     0.000     0.000         3
-                  B-oral_tricolon      0.222     1.000     0.364         2
-                   B-oral_us_them      0.667     0.667     0.667         3
-                  B-oral_vocative      0.941     0.593     0.727        27
-         I-literate_abstract_noun      0.000     0.000     0.000        14
-       I-literate_additive_formal      0.000     0.000     0.000         6
-         I-literate_agent_demoted      0.583     0.933     0.718        15
-     I-literate_agentless_passive      0.420     0.397     0.408        73
-                 I-literate_aside      0.544     0.523     0.533       107
- I-literate_categorical_statement      0.571     0.348     0.432        23
-          I-literate_causal_chain      0.800     0.640     0.711        25
-       I-literate_causal_explicit      0.576     0.826     0.679        23
-              I-literate_citation      0.706     0.250     0.369        48
-   I-literate_conceptual_metaphor      0.714     0.333     0.455        15
-            I-literate_concessive      0.778     1.000     0.875         7
-  I-literate_concessive_connector      0.200     0.333     0.250         3
-           I-literate_conditional      0.676     0.410     0.511       117
-           I-literate_contrastive      0.286     0.400     0.333        15
-       I-literate_cross_reference      0.000     0.000     0.000         0
-     I-literate_definitional_move      1.000     1.000     1.000         5
-           I-literate_enumeration      1.000     0.375     0.545        40
-       I-literate_epistemic_hedge      0.486     0.370     0.420        46
-            I-literate_evidential      0.250     0.034     0.061        29
-    I-literate_footnote_reference      0.800     0.727     0.762        11
- I-literate_institutional_subject      0.833     1.000     0.909         5
-        I-literate_list_structure      0.000     0.000     0.000         3
-         I-literate_metadiscourse      0.200     0.125     0.154        16
-I-literate_methodological_framing      0.667     0.500     0.571        12
-        I-literate_nested_clauses      0.489     0.292     0.366       390
-        I-literate_nominalization      0.000     0.000     0.000        14
-   I-literate_objectifying_stance      0.833     0.769     0.800        13
-               I-literate_paradox      0.100     0.062     0.077        16
-           I-literate_probability      0.000     0.000     0.000         7
-   I-literate_qualified_assertion      0.000     0.000     0.000        21
-        I-literate_relative_chain      0.479     0.531     0.504       262
-I-literate_technical_abbreviation      0.667     0.182     0.286        11
-        I-literate_technical_term      0.455     0.357     0.400        14
-    I-literate_temporal_embedding      1.000     0.588     0.741        51
-I-literate_third_person_reference      0.500     0.167     0.250         6
-              I-oral_alliteration      0.857     0.545     0.667        11
-                  I-oral_anaphora      0.208     0.198     0.203       101
-                 I-oral_asyndeton      0.000     0.000     0.000         7
-         I-oral_audience_response      0.905     0.905     0.905        21
-       I-oral_binomial_expression      0.400     0.727     0.516        11
-            I-oral_conflict_frame      1.000     0.714     0.833         7
-         I-oral_discourse_formula      0.667     0.667     0.667         6
-            I-oral_dramatic_pause      0.400     0.500     0.444         4
-           I-oral_embodied_action      0.000     0.000     0.000        16
-                I-oral_epistrophe      0.000     0.000     0.000         3
-                   I-oral_epithet      0.429     0.600     0.500         5
-          I-oral_everyday_example      0.955     1.000     0.977        21
-              I-oral_first_person      0.000     0.000     0.000         2
-                I-oral_imperative      0.615     0.276     0.381        29
-              I-oral_inclusive_we      0.904     0.922     0.913        51
-      I-oral_intensifier_doubling      0.800     1.000     0.889         4
-        I-oral_lexical_repetition      0.196     0.244     0.217        41
-          I-oral_named_individual      0.579     0.589     0.584        56
-               I-oral_parallelism      0.471     0.287     0.357       143
-              I-oral_phatic_check      1.000     1.000     1.000         3
-             I-oral_phatic_filler      0.667     0.400     0.500         5
-              I-oral_polysyndeton      1.000     0.217     0.356        83
-                   I-oral_proverb      1.000     0.568     0.724        37
-                   I-oral_refrain      1.000     1.000     1.000         4
-         I-oral_religious_formula      1.000     0.125     0.222        16
-       I-oral_rhetorical_question      0.429     0.600     0.500        15
-                    I-oral_rhythm      0.957     0.571     0.715        77
-             I-oral_second_person      0.333     0.143     0.200         7
-           I-oral_self_correction      0.842     0.800     0.821        20
-            I-oral_sensory_detail      1.000     0.800     0.889         5
-        I-oral_simple_conjunction      0.667     1.000     0.800         6
-            I-oral_specific_place      0.714     0.625     0.667         8
-           I-oral_temporal_anchor      0.056     0.100     0.071        10
-                  I-oral_tricolon      0.309     0.806     0.446        31
-                   I-oral_us_them      0.571     0.444     0.500         9
-                  I-oral_vocative      0.897     0.745     0.814        47
-                         accuracy                          0.653      6441
-                        macro avg      0.530     0.487     0.481      6441
-                     weighted avg      0.653     0.653     0.637      6441
 ```
 </details>
-<details><summary>Click to show split proportions per marker</summary>
-```
-bio_train.jsonl: 3460 markers across 72 types
-bio_val.jsonl: 514 markers across 70 types
-bio_test.jsonl: 500 markers across 70 types
-======================================================================
-Marker                                          Train     Val    Test   Total
-======================================================================
-oral_inclusive_we                                 207      26      29     262
-oral_second_person                                160      25      25     210
-literate_agentless_passive                        158      22      24     204
-oral_named_individual                             157      26      20     203
-literate_relative_chain                           146       8      22     176
-literate_epistemic_hedge                          125      23      24     172
-oral_vocative                                     118      17      27     162
-oral_rhetorical_question                          132      16       2     150
-oral_anaphora                                     115      10      15     140
-oral_imperative                                   104      16      14     134
-literate_nested_clauses                           103       4      22     129
-literate_abstract_noun                             95      20      14     129
-oral_discourse_formula                             93      15       6     114
-literate_conditional                               85      10      14     109
-oral_specific_place                                81      22       3     106
-literate_contrastive                               65      11       8      84
-literate_causal_explicit                           69       3      11      83
-oral_temporal_anchor                               66      14       3      83
-oral_parallelism                                   66      10       7      83
-oral_lexical_repetition                            48      12      10      70
-literate_technical_term                            56       8       3      67
-literate_aside                                     51       6       9      66
-literate_nominalization                            44       3      10      57
-oral_tricolon                                      43       8       2      53
-literate_concessive                                37       6       2      45
-oral_epithet                                       36       5       2      43
-literate_additive_formal                           29       4       3      36
-oral_polysyndeton                                  15      10      10      35
-literate_list_structure                            28       5       1      34
-oral_embodied_action                               19       6       6      31
-literate_metadiscourse                             22       5       4      31
-oral_binomial_expression                           23       3       5      31
-oral_alliteration                                  23       5       3      31
-literate_causal_chain                              22       5       3      30
-oral_epistrophe                                    23       4       3      30
-oral_refrain                                       25       4       1      30
-oral_audience_response                             25       1       4      30
-oral_self_correction                               23       4       3      30
-literate_methodological_framing                    21       5       4      30
-oral_rhythm                                        21       3       6      30
-oral_conflict_frame                                24       1       5      30
-literate_footnote_reference                        25       2       3      30
-literate_definitional_move                         25       4       1      30
-literate_evidential                                13       6      11      30
-oral_phatic_filler                                 24       1       5      30
-oral_phatic_check                                  25       4       1      30
-literate_agent_demoted                             21       5       4      30
-literate_enumeration                               24       3       3      30
-literate_conceptual_metaphor                       21       3       6      30
-oral_everyday_example                              22       5       3      30
-oral_us_them                                       24       3       3      30
-oral_intensifier_doubling                          25       2       3      30
-literate_institutional_subject                     22       4       3      29
-literate_temporal_embedding                        23       2       4      29
-literate_concessive_connector                      22       2       5      29
-literate_third_person_reference                    21       5       3      29
-literate_probability                               21       3       5      29
-literate_citation                                  12       7      10      29
-oral_religious_formula                             24       3       2      29
-literate_technical_abbreviation                    24       3       2      29
-literate_qualified_assertion                       23       1       5      29
-literate_categorical_statement                     24       1       4      29
-oral_first_person                                  22       2       5      29
-oral_simple_conjunction                            21       5       3      29
-literate_paradox                                   18       7       3      28
-oral_proverb                                       22       0       6      28 ⚠️
-literate_objectifying_stance                       21       3       4      28
-oral_asyndeton                                     24       3       1      28
-oral_sensory_detail                                21       5       1      27
-oral_dramatic_pause                                20       4       2      26
-literate_cross_reference                           21       5       0      26 ⚠️
-oral_paradox                                        2       0       0       2 ⚠️
-======================================================================
-TOTAL                                            3460     514     500    4474
---- Long Tail Summary ---
-Markers with < 10 examples:   1 (1%)
-Markers with < 20 examples:   1 (1%)
-Markers with < 30 examples:  20 (28%)
-Markers with < 50 examples:  48 (67%)
-Markers with <100 examples:  57 (79%)
-```
-- **Note**: ⚠️  indicates a 0 sized split
-  - `oral_proverb`: 0 val split
-  - `literate_cross_reference`: 0 test split
-  - `oral_paradox`: 0 val/test splits
-</details>
-**Best Val F1 (markers only):** 0.5003
-**Macro F1 (all 145 labels, test):** 0.481
-**Weighted F1 (test):** 0.637
-**Accuracy (test):** 65.3%
 ## Architecture
-Custom `BertTokenClassifier` with focal loss:
 ```
 BertModel (bert-base-uncased)
     └── Dropout (p=0.1)
-        └── Linear (768 → 145)
-            └── FocalLoss (α=1.0, γ=1.0)
 ```
-Focal loss addresses class imbalance by down-weighting well-classified tokens (mostly "O") and focusing on hard examples (rare markers).
 ### Initialization
-Fine-tuned from `bert-base-uncased`. The classification head (`classifier.weight`, `classifier.bias`) is randomly initialized:
 ```
-bert.* layers      → loaded from checkpoint
 classifier.weight  → randomly initialized
 classifier.bias    → randomly initialized
 ```
 ## Limitations
-- **Rare markers**: Types with <10 training examples (e.g., `oral_paradox`, `oral_dramatic_pause`) have poor recall
 - **Context window**: 128 tokens max; longer spans may be truncated
 - **Domain**: Trained primarily on historical/literary texts; may underperform on modern social media
 - **Subjectivity**: Some marker boundaries are inherently ambiguous
@@ -422,8 +229,8 @@ classifier.bias    → randomly initialized
 ## References
 - Ong, Walter J. *Orality and Literacy: The Technologizing of the Word*. Routledge, 1982.
-- Lin, T.-Y. et al. "Focal Loss for Dense Object Detection." ICCV 2017.
 ---
-*Model version: 668564aa • Trained: February 2026*

 - bert
 - orality
 - linguistics
+- multi-label
 language:
 - en
 metrics:
 BERT-based token classifier for detecting **oral and literate markers** in text, based on Walter Ong's "Orality and Literacy" (1982).
+This model performs multi-label span-level detection of 53 rhetorical marker types, where each token independently carries B/I/O labels per type — allowing overlapping spans (e.g. a token that is simultaneously part of a concessive and a nested clause).
 ## Model Details
 | Property | Value |
 |----------|-------|
 | Base model | `bert-base-uncased` |
+| Task | Multi-label token classification (independent B/I/O per type) |
+| Marker types | 53 (22 oral, 31 literate) |
+| Test macro F1 | **0.388** (per-type detection, binary positive = B or I) |
+| Training | 20 epochs, batch 24, lr 3e-5, fp16 |
+| Regularization | Mixout (p=0.1) — stochastic L2 anchor to pretrained weights |
+| Loss | Per-type weighted cross-entropy with inverse-frequency type weights |
+| Min examples | 150 (types below this threshold excluded) |
 ## Usage
 ```python
+import json
 import torch
+from transformers import AutoTokenizer
+from estimators.tokens.model import MultiLabelTokenClassifier
+model_path = "models/bert_token_classifier"
+tokenizer = AutoTokenizer.from_pretrained(model_path)
+model = MultiLabelTokenClassifier.load(model_path, device="cpu")
+model.eval()
+type_to_idx = json.loads((model_path / "type_to_idx.json").read_text())
+idx_to_type = {v: k for k, v in type_to_idx.items()}
 text = "Tell me, O Muse, of that ingenious hero who travelled far and wide"
+inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
 with torch.no_grad():
+    logits = model(inputs["input_ids"], inputs["attention_mask"])
+    preds = logits.argmax(dim=-1)  # (1, seq, num_types)
 tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
+for i, token in enumerate(tokens):
+    active = [
+        f"{idx_to_type[t]}={'OBI'[v]}"
+        for t, v in enumerate(preds[0, i].tolist())
+        if v > 0
+    ]
+    if active:
+        print(f"{token:15} {', '.join(active)}")
 ```
 ## Training Data
 - Sources: Project Gutenberg, textfiles.com, Reddit, Wikipedia talk pages
+- Types with fewer than 150 annotated spans are excluded from training
+- Multi-label BIO annotation: tokens can carry labels for multiple overlapping marker types simultaneously
+## Marker Types (53)
+### Oral Markers (22 types)
 Characteristics of oral tradition and spoken discourse:
 | Category | Markers |
 |----------|---------|
+| **Address & Interaction** | vocative, imperative, second_person, inclusive_we, rhetorical_question, phatic_check, phatic_filler |
+| **Repetition & Pattern** | anaphora, parallelism, tricolon, lexical_repetition, antithesis |
+| **Conjunction** | simple_conjunction |
+| **Formulas** | discourse_formula, intensifier_doubling |
 | **Narrative** | named_individual, specific_place, temporal_anchor, sensory_detail, embodied_action, everyday_example |
+| **Performance** | self_correction |
+### Literate Markers (31 types)
 Characteristics of written, analytical discourse:
 | Category | Markers |
 |----------|---------|
 | **Abstraction** | nominalization, abstract_noun, conceptual_metaphor, categorical_statement |
+| **Syntax** | nested_clauses, relative_chain, conditional, concessive, temporal_embedding, causal_explicit |
 | **Hedging** | epistemic_hedge, probability, evidential, qualified_assertion, concessive_connector |
+| **Impersonality** | agentless_passive, agent_demoted, institutional_subject, objectifying_stance |
+| **Scholarly apparatus** | citation, cross_reference, metadiscourse, definitional_move |
+| **Technical** | technical_term, technical_abbreviation, enumeration, list_structure |
+| **Connectives** | contrastive, additive_formal |
+| **Setting** | concrete_setting, aside |
 ## Evaluation
+Per-type detection F1 on test set (binary: B or I = positive, O = negative):
 <details><summary>Click to show per-marker precision/recall/F1/support</summary>
 ```
+Type                                            Prec    Rec     F1    Sup
+========================================================================
+literate_abstract_noun                         0.119  0.114  0.116    466
+literate_additive_formal                       0.225  0.576  0.323     85
+literate_agent_demoted                         0.345  0.670  0.455    288
+literate_agentless_passive                     0.399  0.750  0.521   1286
+literate_aside                                 0.399  0.599  0.479    461
+literate_categorical_statement                 0.191  0.277  0.226    393
+literate_causal_explicit                       0.285  0.370  0.322    376
+literate_citation                              0.515  0.671  0.582    237
+literate_conceptual_metaphor                   0.172  0.387  0.238    222
+literate_concessive                            0.475  0.596  0.529    740
+literate_concessive_connector                  0.107  0.514  0.178     37
+literate_concrete_setting                      0.189  0.462  0.269    292
+literate_conditional                           0.511  0.823  0.631   1609
+literate_contrastive                           0.310  0.460  0.370    383
+literate_cross_reference                       0.390  0.366  0.377     82
+literate_definitional_move                     0.288  0.515  0.370     66
+literate_enumeration                           0.285  0.743  0.412    855
+literate_epistemic_hedge                       0.339  0.564  0.424    541
+literate_evidential                            0.323  0.630  0.427    162
+literate_institutional_subject                 0.237  0.532  0.328    250
+literate_list_structure                        0.795  0.529  0.635    652
+literate_metadiscourse                         0.243  0.446  0.314    361
+literate_nested_clauses                        0.148  0.398  0.216   1271
+literate_nominalization                        0.241  0.490  0.323   1140
+literate_objectifying_stance                   0.474  0.469  0.471    192
+literate_probability                           0.572  0.728  0.641    114
+literate_qualified_assertion                   0.132  0.163  0.146    123
+literate_relative_chain                        0.282  0.572  0.378   1753
+literate_technical_abbreviation                0.381  0.773  0.510    132
+literate_technical_term                        0.264  0.481  0.341    908
+literate_temporal_embedding                    0.187  0.318  0.235    550
+oral_anaphora                                  0.120  0.348  0.179    141
+oral_antithesis                                0.213  0.249  0.230    453
+oral_discourse_formula                         0.287  0.432  0.345    570
+oral_embodied_action                           0.247  0.430  0.314    465
+oral_everyday_example                          0.263  0.411  0.320    358
+oral_imperative                                0.402  0.787  0.532    211
+oral_inclusive_we                              0.485  0.819  0.609    747
+oral_intensifier_doubling                      0.291  0.316  0.303     79
+oral_lexical_repetition                        0.331  0.550  0.414    218
+oral_named_individual                          0.386  0.708  0.500    818
+oral_parallelism                               0.674  0.041  0.077    710
+oral_phatic_check                              0.432  0.829  0.568     76
+oral_phatic_filler                             0.340  0.630  0.442    184
+oral_rhetorical_question                       0.587  0.899  0.710   1276
+oral_second_person                             0.421  0.610  0.498    839
+oral_self_correction                           0.479  0.372  0.419    156
+oral_sensory_detail                            0.249  0.452  0.321    367
+oral_simple_conjunction                        0.096  0.343  0.150     70
+oral_specific_place                            0.396  0.717  0.510    367
+oral_temporal_anchor                           0.347  0.831  0.490    555
+oral_tricolon                                  0.217  0.220  0.218    560
+oral_vocative                                  0.505  0.759  0.607    133
+========================================================================
+Macro avg (types w/ support)                                 0.388
 ```
 </details>
+**Missing labels (test set):** 0/53 — all types detected at least once.
+Notable patterns:
+- **Strong performers** (F1 > 0.5): rhetorical_question (0.710), probability (0.641), list_structure (0.635), conditional (0.631), inclusive_we (0.609), vocative (0.607), citation (0.582), phatic_check (0.568)
+- **Weak performers** (F1 < 0.2): parallelism (0.077), simple_conjunction (0.150), abstract_noun (0.116), qualified_assertion (0.146), concessive_connector (0.178), anaphora (0.179)
+- **Precision-recall tradeoff**: Most types show higher recall than precision, indicating the model over-predicts rather than under-predicts markers
 ## Architecture
+Custom `MultiLabelTokenClassifier` with independent B/I/O heads per marker type:
 ```
 BertModel (bert-base-uncased)
     └── Dropout (p=0.1)
+        └── Linear (768 → num_types × 3)
+            └── Reshape to (batch, seq, num_types, 3)
 ```
+Each marker type gets an independent 3-way O/B/I classification, so a token can simultaneously carry labels for multiple overlapping marker types. Types share the full backbone representation but make independent predictions.
+### Regularization
+- **Mixout** (p=0.1): During training, each backbone weight element has a 10% chance of being replaced by its pretrained value per forward pass, acting as a stochastic L2 anchor that prevents representation drift (Lee et al., 2019)
+- **Inverse-frequency type weights**: Rare marker types receive higher loss weighting
+- **Inverse-frequency OBI weights**: B and I classes upweighted relative to dominant O class
+- **Weighted random sampling**: Examples containing rarer markers sampled more frequently
 ### Initialization
+Fine-tuned from `bert-base-uncased`. Backbone linear layers wrapped with Mixout during training (frozen pretrained copy used as anchor). The classification head is randomly initialized:
 ```
+backbone.* layers  → loaded from pretrained, anchored via Mixout
 classifier.weight  → randomly initialized
 classifier.bias    → randomly initialized
 ```
 ## Limitations
+- **Low-precision types**: Several types show precision below 0.2, meaning most predictions for those types are false positives
+- **Parallelism collapse**: `oral_parallelism` has high precision (0.674) but near-zero recall (0.041), suggesting the model learned a very narrow pattern
 - **Context window**: 128 tokens max; longer spans may be truncated
 - **Domain**: Trained primarily on historical/literary texts; may underperform on modern social media
 - **Subjectivity**: Some marker boundaries are inherently ambiguous
 ## References
 - Ong, Walter J. *Orality and Literacy: The Technologizing of the Word*. Routledge, 1982.
+- Lee, C. et al. "Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models." ICLR 2020.
 ---
+*Trained: February 2026*

config.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "add_cross_attention": false,
   "architectures": [
-    "BertForTokenClassification"
   ],
   "attention_probs_dropout_prob": 0.1,
   "bos_token_id": null,
@@ -12,303 +12,9 @@
   "hidden_act": "gelu",
   "hidden_dropout_prob": 0.1,
   "hidden_size": 768,
-  "id2label": {
-    "0": "O",
-    "1": "B-literate_abstract_noun",
-    "2": "B-literate_additive_formal",
-    "3": "B-literate_agent_demoted",
-    "4": "B-literate_agentless_passive",
-    "5": "B-literate_aside",
-    "6": "B-literate_categorical_statement",
-    "7": "B-literate_causal_chain",
-    "8": "B-literate_causal_explicit",
-    "9": "B-literate_citation",
-    "10": "B-literate_conceptual_metaphor",
-    "11": "B-literate_concessive",
-    "12": "B-literate_concessive_connector",
-    "13": "B-literate_conditional",
-    "14": "B-literate_contrastive",
-    "15": "B-literate_cross_reference",
-    "16": "B-literate_definitional_move",
-    "17": "B-literate_enumeration",
-    "18": "B-literate_epistemic_hedge",
-    "19": "B-literate_evidential",
-    "20": "B-literate_footnote_reference",
-    "21": "B-literate_institutional_subject",
-    "22": "B-literate_list_structure",
-    "23": "B-literate_metadiscourse",
-    "24": "B-literate_methodological_framing",
-    "25": "B-literate_nested_clauses",
-    "26": "B-literate_nominalization",
-    "27": "B-literate_objectifying_stance",
-    "28": "B-literate_paradox",
-    "29": "B-literate_probability",
-    "30": "B-literate_qualified_assertion",
-    "31": "B-literate_relative_chain",
-    "32": "B-literate_technical_abbreviation",
-    "33": "B-literate_technical_term",
-    "34": "B-literate_temporal_embedding",
-    "35": "B-literate_third_person_reference",
-    "36": "B-oral_alliteration",
-    "37": "B-oral_anaphora",
-    "38": "B-oral_asyndeton",
-    "39": "B-oral_audience_response",
-    "40": "B-oral_binomial_expression",
-    "41": "B-oral_conflict_frame",
-    "42": "B-oral_discourse_formula",
-    "43": "B-oral_dramatic_pause",
-    "44": "B-oral_embodied_action",
-    "45": "B-oral_epistrophe",
-    "46": "B-oral_epithet",
-    "47": "B-oral_everyday_example",
-    "48": "B-oral_first_person",
-    "49": "B-oral_imperative",
-    "50": "B-oral_inclusive_we",
-    "51": "B-oral_intensifier_doubling",
-    "52": "B-oral_lexical_repetition",
-    "53": "B-oral_named_individual",
-    "54": "B-oral_paradox",
-    "55": "B-oral_parallelism",
-    "56": "B-oral_phatic_check",
-    "57": "B-oral_phatic_filler",
-    "58": "B-oral_polysyndeton",
-    "59": "B-oral_proverb",
-    "60": "B-oral_refrain",
-    "61": "B-oral_religious_formula",
-    "62": "B-oral_rhetorical_question",
-    "63": "B-oral_rhythm",
-    "64": "B-oral_second_person",
-    "65": "B-oral_self_correction",
-    "66": "B-oral_sensory_detail",
-    "67": "B-oral_simple_conjunction",
-    "68": "B-oral_specific_place",
-    "69": "B-oral_temporal_anchor",
-    "70": "B-oral_tricolon",
-    "71": "B-oral_us_them",
-    "72": "B-oral_vocative",
-    "73": "I-literate_abstract_noun",
-    "74": "I-literate_additive_formal",
-    "75": "I-literate_agent_demoted",
-    "76": "I-literate_agentless_passive",
-    "77": "I-literate_aside",
-    "78": "I-literate_categorical_statement",
-    "79": "I-literate_causal_chain",
-    "80": "I-literate_causal_explicit",
-    "81": "I-literate_citation",
-    "82": "I-literate_conceptual_metaphor",
-    "83": "I-literate_concessive",
-    "84": "I-literate_concessive_connector",
-    "85": "I-literate_conditional",
-    "86": "I-literate_contrastive",
-    "87": "I-literate_cross_reference",
-    "88": "I-literate_definitional_move",
-    "89": "I-literate_enumeration",
-    "90": "I-literate_epistemic_hedge",
-    "91": "I-literate_evidential",
-    "92": "I-literate_footnote_reference",
-    "93": "I-literate_institutional_subject",
-    "94": "I-literate_list_structure",
-    "95": "I-literate_metadiscourse",
-    "96": "I-literate_methodological_framing",
-    "97": "I-literate_nested_clauses",
-    "98": "I-literate_nominalization",
-    "99": "I-literate_objectifying_stance",
-    "100": "I-literate_paradox",
-    "101": "I-literate_probability",
-    "102": "I-literate_qualified_assertion",
-    "103": "I-literate_relative_chain",
-    "104": "I-literate_technical_abbreviation",
-    "105": "I-literate_technical_term",
-    "106": "I-literate_temporal_embedding",
-    "107": "I-literate_third_person_reference",
-    "108": "I-oral_alliteration",
-    "109": "I-oral_anaphora",
-    "110": "I-oral_asyndeton",
-    "111": "I-oral_audience_response",
-    "112": "I-oral_binomial_expression",
-    "113": "I-oral_conflict_frame",
-    "114": "I-oral_discourse_formula",
-    "115": "I-oral_dramatic_pause",
-    "116": "I-oral_embodied_action",
-    "117": "I-oral_epistrophe",
-    "118": "I-oral_epithet",
-    "119": "I-oral_everyday_example",
-    "120": "I-oral_first_person",
-    "121": "I-oral_imperative",
-    "122": "I-oral_inclusive_we",
-    "123": "I-oral_intensifier_doubling",
-    "124": "I-oral_lexical_repetition",
-    "125": "I-oral_named_individual",
-    "126": "I-oral_paradox",
-    "127": "I-oral_parallelism",
-    "128": "I-oral_phatic_check",
-    "129": "I-oral_phatic_filler",
-    "130": "I-oral_polysyndeton",
-    "131": "I-oral_proverb",
-    "132": "I-oral_refrain",
-    "133": "I-oral_religious_formula",
-    "134": "I-oral_rhetorical_question",
-    "135": "I-oral_rhythm",
-    "136": "I-oral_second_person",
-    "137": "I-oral_self_correction",
-    "138": "I-oral_sensory_detail",
-    "139": "I-oral_simple_conjunction",
-    "140": "I-oral_specific_place",
-    "141": "I-oral_temporal_anchor",
-    "142": "I-oral_tricolon",
-    "143": "I-oral_us_them",
-    "144": "I-oral_vocative"
-  },
   "initializer_range": 0.02,
   "intermediate_size": 3072,
   "is_decoder": false,
-  "label2id": {
-    "B-literate_abstract_noun": 1,
-    "B-literate_additive_formal": 2,
-    "B-literate_agent_demoted": 3,
-    "B-literate_agentless_passive": 4,
-    "B-literate_aside": 5,
-    "B-literate_categorical_statement": 6,
-    "B-literate_causal_chain": 7,
-    "B-literate_causal_explicit": 8,
-    "B-literate_citation": 9,
-    "B-literate_conceptual_metaphor": 10,
-    "B-literate_concessive": 11,
-    "B-literate_concessive_connector": 12,
-    "B-literate_conditional": 13,
-    "B-literate_contrastive": 14,
-    "B-literate_cross_reference": 15,
-    "B-literate_definitional_move": 16,
-    "B-literate_enumeration": 17,
-    "B-literate_epistemic_hedge": 18,
-    "B-literate_evidential": 19,
-    "B-literate_footnote_reference": 20,
-    "B-literate_institutional_subject": 21,
-    "B-literate_list_structure": 22,
-    "B-literate_metadiscourse": 23,
-    "B-literate_methodological_framing": 24,
-    "B-literate_nested_clauses": 25,
-    "B-literate_nominalization": 26,
-    "B-literate_objectifying_stance": 27,
-    "B-literate_paradox": 28,
-    "B-literate_probability": 29,
-    "B-literate_qualified_assertion": 30,
-    "B-literate_relative_chain": 31,
-    "B-literate_technical_abbreviation": 32,
-    "B-literate_technical_term": 33,
-    "B-literate_temporal_embedding": 34,
-    "B-literate_third_person_reference": 35,
-    "B-oral_alliteration": 36,
-    "B-oral_anaphora": 37,
-    "B-oral_asyndeton": 38,
-    "B-oral_audience_response": 39,
-    "B-oral_binomial_expression": 40,
-    "B-oral_conflict_frame": 41,
-    "B-oral_discourse_formula": 42,
-    "B-oral_dramatic_pause": 43,
-    "B-oral_embodied_action": 44,
-    "B-oral_epistrophe": 45,
-    "B-oral_epithet": 46,
-    "B-oral_everyday_example": 47,
-    "B-oral_first_person": 48,
-    "B-oral_imperative": 49,
-    "B-oral_inclusive_we": 50,
-    "B-oral_intensifier_doubling": 51,
-    "B-oral_lexical_repetition": 52,
-    "B-oral_named_individual": 53,
-    "B-oral_paradox": 54,
-    "B-oral_parallelism": 55,
-    "B-oral_phatic_check": 56,
-    "B-oral_phatic_filler": 57,
-    "B-oral_polysyndeton": 58,
-    "B-oral_proverb": 59,
-    "B-oral_refrain": 60,
-    "B-oral_religious_formula": 61,
-    "B-oral_rhetorical_question": 62,
-    "B-oral_rhythm": 63,
-    "B-oral_second_person": 64,
-    "B-oral_self_correction": 65,
-    "B-oral_sensory_detail": 66,
-    "B-oral_simple_conjunction": 67,
-    "B-oral_specific_place": 68,
-    "B-oral_temporal_anchor": 69,
-    "B-oral_tricolon": 70,
-    "B-oral_us_them": 71,
-    "B-oral_vocative": 72,
-    "I-literate_abstract_noun": 73,
-    "I-literate_additive_formal": 74,
-    "I-literate_agent_demoted": 75,
-    "I-literate_agentless_passive": 76,
-    "I-literate_aside": 77,
-    "I-literate_categorical_statement": 78,
-    "I-literate_causal_chain": 79,
-    "I-literate_causal_explicit": 80,
-    "I-literate_citation": 81,
-    "I-literate_conceptual_metaphor": 82,
-    "I-literate_concessive": 83,
-    "I-literate_concessive_connector": 84,
-    "I-literate_conditional": 85,
-    "I-literate_contrastive": 86,
-    "I-literate_cross_reference": 87,
-    "I-literate_definitional_move": 88,
-    "I-literate_enumeration": 89,
-    "I-literate_epistemic_hedge": 90,
-    "I-literate_evidential": 91,
-    "I-literate_footnote_reference": 92,
-    "I-literate_institutional_subject": 93,
-    "I-literate_list_structure": 94,
-    "I-literate_metadiscourse": 95,
-    "I-literate_methodological_framing": 96,
-    "I-literate_nested_clauses": 97,
-    "I-literate_nominalization": 98,
-    "I-literate_objectifying_stance": 99,
-    "I-literate_paradox": 100,
-    "I-literate_probability": 101,
-    "I-literate_qualified_assertion": 102,
-    "I-literate_relative_chain": 103,
-    "I-literate_technical_abbreviation": 104,
-    "I-literate_technical_term": 105,
-    "I-literate_temporal_embedding": 106,
-    "I-literate_third_person_reference": 107,
-    "I-oral_alliteration": 108,
-    "I-oral_anaphora": 109,
-    "I-oral_asyndeton": 110,
-    "I-oral_audience_response": 111,
-    "I-oral_binomial_expression": 112,
-    "I-oral_conflict_frame": 113,
-    "I-oral_discourse_formula": 114,
-    "I-oral_dramatic_pause": 115,
-    "I-oral_embodied_action": 116,
-    "I-oral_epistrophe": 117,
-    "I-oral_epithet": 118,
-    "I-oral_everyday_example": 119,
-    "I-oral_first_person": 120,
-    "I-oral_imperative": 121,
-    "I-oral_inclusive_we": 122,
-    "I-oral_intensifier_doubling": 123,
-    "I-oral_lexical_repetition": 124,
-    "I-oral_named_individual": 125,
-    "I-oral_paradox": 126,
-    "I-oral_parallelism": 127,
-    "I-oral_phatic_check": 128,
-    "I-oral_phatic_filler": 129,
-    "I-oral_polysyndeton": 130,
-    "I-oral_proverb": 131,
-    "I-oral_refrain": 132,
-    "I-oral_religious_formula": 133,
-    "I-oral_rhetorical_question": 134,
-    "I-oral_rhythm": 135,
-    "I-oral_second_person": 136,
-    "I-oral_self_correction": 137,
-    "I-oral_sensory_detail": 138,
-    "I-oral_simple_conjunction": 139,
-    "I-oral_specific_place": 140,
-    "I-oral_temporal_anchor": 141,
-    "I-oral_tricolon": 142,
-    "I-oral_us_them": 143,
-    "I-oral_vocative": 144,
-    "O": 0
-  },
   "layer_norm_eps": 1e-12,
   "max_position_embeddings": 512,
   "model_type": "bert",

 {
   "add_cross_attention": false,
   "architectures": [
+    "BertModel"
   ],
   "attention_probs_dropout_prob": 0.1,
   "bos_token_id": null,
   "hidden_act": "gelu",
   "hidden_dropout_prob": 0.1,
   "hidden_size": 768,
   "initializer_range": 0.02,
   "intermediate_size": 3072,
   "is_decoder": false,
   "layer_norm_eps": 1e-12,
   "max_position_embeddings": 512,
   "model_type": "bert",

head_config.json ADDED Viewed

	@@ -0,0 +1,5 @@

+{
+  "model_name": "bert-base-uncased",
+  "num_types": 53,
+  "hidden_size": 768
+}

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:843209761c32ebf9e994fe50058c120e9b945d381da0cbec76b14f1fce7fe250
-size 436035932

 version https://git-lfs.github.com/spec/v1
+oid sha256:39a2573fe1ef3b14efb3f578f96bb56543fed59684832e04eee92a232654a65a
+size 438442348

type_to_idx.json ADDED Viewed

	@@ -0,0 +1,55 @@

+{
+  "literate_abstract_noun": 0,
+  "literate_additive_formal": 1,
+  "literate_agent_demoted": 2,
+  "literate_agentless_passive": 3,
+  "literate_aside": 4,
+  "literate_categorical_statement": 5,
+  "literate_causal_explicit": 6,
+  "literate_citation": 7,
+  "literate_conceptual_metaphor": 8,
+  "literate_concessive": 9,
+  "literate_concessive_connector": 10,
+  "literate_concrete_setting": 11,
+  "literate_conditional": 12,
+  "literate_contrastive": 13,
+  "literate_cross_reference": 14,
+  "literate_definitional_move": 15,
+  "literate_enumeration": 16,
+  "literate_epistemic_hedge": 17,
+  "literate_evidential": 18,
+  "literate_institutional_subject": 19,
+  "literate_list_structure": 20,
+  "literate_metadiscourse": 21,
+  "literate_nested_clauses": 22,
+  "literate_nominalization": 23,
+  "literate_objectifying_stance": 24,
+  "literate_probability": 25,
+  "literate_qualified_assertion": 26,
+  "literate_relative_chain": 27,
+  "literate_technical_abbreviation": 28,
+  "literate_technical_term": 29,
+  "literate_temporal_embedding": 30,
+  "oral_anaphora": 31,
+  "oral_antithesis": 32,
+  "oral_discourse_formula": 33,
+  "oral_embodied_action": 34,
+  "oral_everyday_example": 35,
+  "oral_imperative": 36,
+  "oral_inclusive_we": 37,
+  "oral_intensifier_doubling": 38,
+  "oral_lexical_repetition": 39,
+  "oral_named_individual": 40,
+  "oral_parallelism": 41,
+  "oral_phatic_check": 42,
+  "oral_phatic_filler": 43,
+  "oral_rhetorical_question": 44,
+  "oral_second_person": 45,
+  "oral_self_correction": 46,
+  "oral_sensory_detail": 47,
+  "oral_simple_conjunction": 48,
+  "oral_specific_place": 49,
+  "oral_temporal_anchor": 50,
+  "oral_tricolon": 51,
+  "oral_vocative": 52
+}