HavelockAI
/

bert-marker-type

@@ -2,7 +2,7 @@
 license: mit
 tags:
 - text-classification
-- bert
 - orality
 - linguistics
 - rhetorical-analysis
@@ -12,7 +12,7 @@ metrics:
 - f1
 - accuracy
 base_model:
-- google-bert/bert-base-uncased
 pipeline_tag: text-classification
 library_name: transformers
 datasets:
@@ -25,7 +25,7 @@ model-index:
       name: Marker Type Classification
     metrics:
     - type: f1
-      value: 0.583
       name: F1 (macro)
     - type: accuracy
       value: 0.584
@@ -34,7 +34,7 @@ model-index:
 # Havelock Marker Type Classifier
-BERT-based classifier for **18 rhetorical marker types** on the oral–literate spectrum, grounded in Walter Ong's *Orality and Literacy* (1982).
 This is the mid-level of the Havelock span classification hierarchy. Given a text span identified as a rhetorical marker, the model classifies it into one of 18 functional types (e.g., `repetition`, `subordination`, `direct_address`, `hedging_qualification`).
@@ -42,14 +42,14 @@ This is the mid-level of the Havelock span classification hierarchy. Given a tex
 | Property | Value |
 |----------|-------|
-| Base model | `bert-base-uncased` |
-| Architecture | `BertForSequenceClassification` |
 | Task | Multi-class classification (18 classes) |
 | Max sequence length | 128 tokens |
-| Test F1 (macro) | **0.583** |
 | Test Accuracy | **0.584** |
 | Missing labels | **0/18** |
-| Parameters | ~109M |
 ## Usage
 ```python
@@ -91,7 +91,7 @@ The 18 types group fine-grained subtypes into functional families. Prior version
 ### Data
-Span-level annotations from the Havelock corpus. Each span carries a `marker_type` field normalized against a canonical taxonomy at build time. A stratified 80/10/10 train/val/test split was used with swap-based optimization to balance label distributions across splits. The test set contains 2,178 spans.
 ### Hyperparameters
@@ -109,53 +109,77 @@ Span-level annotations from the Havelock corpus. Each span carries a `marker_typ
 | Mixed precision | FP16 |
 | Min examples per class | 50 |
 ### Test Set Classification Report
 <details><summary>Click to expand per-class precision/recall/F1/support</summary>
 ```
                        precision    recall  f1-score   support
-          abstraction      0.379     0.667     0.483       117
-    agonistic_framing      0.806     0.781     0.794        32
-  analytical_distance      0.542     0.483     0.511       120
- concrete_situational      0.495     0.385     0.433       143
-       direct_address      0.693     0.640     0.666       367
-    formulaic_phrases      0.214     0.471     0.294        51
-hedging_qualification      0.512     0.544     0.528       114
-     literate_feature      0.520     0.803     0.631        66
-  logical_connectives      0.527     0.548     0.538       124
-         oral_feature      0.813     0.465     0.592       159
-          parallelism      0.714     0.789     0.750        19
-            parataxis      0.647     0.473     0.547        93
-    passive_agentless      0.643     0.581     0.610        62
-  performance_markers      0.638     0.481     0.548        77
-           repetition      0.661     0.724     0.691       156
-       sound_patterns      0.661     0.536     0.592        69
-        subordination      0.626     0.639     0.632       296
-    textual_apparatus      0.711     0.611     0.657       113
              accuracy                          0.584      2178
-            macro avg      0.600     0.590     0.583      2178
-         weighted avg      0.613     0.584     0.589      2178
 ```
 </details>
-**Top performing types (F1 ≥ 0.65):** `agonistic_framing` (0.794), `parallelism` (0.750), `repetition` (0.691), `direct_address` (0.666), `textual_apparatus` (0.657), `literate_feature` (0.631), `subordination` (0.632), `passive_agentless` (0.610).
-**Weakest types (F1 < 0.50):** `formulaic_phrases` (0.294), `concrete_situational` (0.433), `abstraction` (0.483). These are high-frequency classes where confusion with related types is common.
 ## Limitations
-- **Class imbalance**: `direct_address` has 367 test examples while `parallelism` has 19. Weighted F1 (0.589) is close to macro F1 (0.583), indicating reasonably balanced performance, but rare types remain harder.
 - **Span-level only**: Requires pre-extracted spans. Does not detect boundaries.
 - **128-token context window**: Longer spans are truncated.
-- **Abstraction underperforms**: At 0.483 F1 despite being the 3rd largest class (117 test spans), suggesting the type may be too broad or overlapping with `analytical_distance` and `literate_feature`.
 ## Theoretical Background
 The type level captures functional groupings within the oral–literate framework. Oral types reflect Ong's characterization of oral discourse as additive (`parataxis`), aggregative (`formulaic_phrases`), redundant (`repetition`), agonistically toned (`agonistic_framing`), empathetic and participatory (`direct_address`), and close to the human lifeworld (`concrete_situational`). Literate types capture the analytic (`abstraction`, `subordination`), distanced (`analytical_distance`, `passive_agentless`), and self-referential (`textual_apparatus`) qualities of written discourse.
 ## Citation
 ```bibtex
 @misc{havelock2026type,
@@ -169,7 +193,9 @@ The type level captures functional groupings within the oral–literate framewor
 ## References
 - Ong, Walter J. *Orality and Literacy: The Technologizing of the Word*. Routledge, 1982.
 ---
-*Model version: b31f147d · Trained: February 2026*

 license: mit
 tags:
 - text-classification
+- modernbert
 - orality
 - linguistics
 - rhetorical-analysis
 - f1
 - accuracy
 base_model:
+- answerdotai/ModernBERT-base
 pipeline_tag: text-classification
 library_name: transformers
 datasets:
       name: Marker Type Classification
     metrics:
     - type: f1
+      value: 0.573
       name: F1 (macro)
     - type: accuracy
       value: 0.584
 # Havelock Marker Type Classifier
+ModernBERT-based classifier for **18 rhetorical marker types** on the oral–literate spectrum, grounded in Walter Ong's *Orality and Literacy* (1982).
 This is the mid-level of the Havelock span classification hierarchy. Given a text span identified as a rhetorical marker, the model classifies it into one of 18 functional types (e.g., `repetition`, `subordination`, `direct_address`, `hedging_qualification`).
 | Property | Value |
 |----------|-------|
+| Base model | `answerdotai/ModernBERT-base` |
+| Architecture | `ModernBertForSequenceClassification` |
 | Task | Multi-class classification (18 classes) |
 | Max sequence length | 128 tokens |
+| Test F1 (macro) | **0.573** |
 | Test Accuracy | **0.584** |
 | Missing labels | **0/18** |
+| Parameters | ~149M |
 ## Usage
 ```python
 ### Data
+22,367 span-level annotations from the Havelock corpus. Each span carries a `marker_type` field normalized against a canonical taxonomy at build time. A stratified 80/10/10 train/val/test split was used with swap-based optimization to balance label distributions across splits. The test set contains 2,178 spans.
 ### Hyperparameters
 | Mixed precision | FP16 |
 | Min examples per class | 50 |
+### Training Metrics
+Best checkpoint selected at epoch 15 by missing-label-primary, F1-tiebreaker (0 missing, F1 0.590).
 ### Test Set Classification Report
 <details><summary>Click to expand per-class precision/recall/F1/support</summary>
 ```
                        precision    recall  f1-score   support
+          abstraction      0.368     0.658     0.472       117
+    agonistic_framing      0.857     0.750     0.800        32
+  analytical_distance      0.504     0.475     0.489       120
+ concrete_situational      0.509     0.385     0.438       143
+       direct_address      0.671     0.689     0.680       367
+    formulaic_phrases      0.205     0.608     0.307        51
+hedging_qualification      0.600     0.500     0.545       114
+     literate_feature      0.478     0.833     0.608        66
+  logical_connectives      0.621     0.516     0.564       124
+         oral_feature      0.784     0.365     0.498       159
+          parallelism      0.688     0.579     0.629        19
+            parataxis      0.655     0.387     0.486        93
+    passive_agentless      0.721     0.500     0.590        62
+  performance_markers      0.660     0.403     0.500        77
+           repetition      0.738     0.705     0.721       156
+       sound_patterns      0.672     0.623     0.647        69
+        subordination      0.622     0.689     0.654       296
+    textual_apparatus      0.718     0.655     0.685       113
              accuracy                          0.584      2178
+            macro avg      0.615     0.573     0.573      2178
+         weighted avg      0.624     0.584     0.587      2178
 ```
 </details>
+**Top performing types (F1 ≥ 0.65):** `agonistic_framing` (0.800), `repetition` (0.721), `textual_apparatus` (0.685), `direct_address` (0.680), `subordination` (0.654), `sound_patterns` (0.647), `parallelism` (0.629), `literate_feature` (0.608).
+**Weakest types (F1 < 0.50):** `formulaic_phrases` (0.307), `concrete_situational` (0.438), `abstraction` (0.472), `parataxis` (0.486), `oral_feature` (0.498). `formulaic_phrases` suffers from severe precision collapse (P=0.205) despite reasonable recall, suggesting heavy confusion with other oral types. `oral_feature` shows the inverse pattern (P=0.784, R=0.365) — the model is confident but conservative.
+## Class Distribution
+| Support Range | Classes | Examples |
+|---------------|---------|----------|
+| >2500 | `direct_address`, `subordination`, `abstraction` | 3 |
+| 1000–2500 | `repetition`, `formulaic_phrases`, `hedging_qualification`, `analytical_distance`, `concrete_situational`, `logical_connectives`, `textual_apparatus` | 7 |
+| 500–1000 | `sound_patterns`, `passive_agentless`, `performance_markers`, `parataxis`, `literate_feature`, `oral_feature` | 6 |
+| <500 | `agonistic_framing`, `parallelism` | 2 |
 ## Limitations
+- **Class imbalance**: `direct_address` has 367 test examples while `parallelism` has 19. Weighted F1 (0.587) is close to macro F1 (0.573), indicating reasonably balanced performance, but rare types remain harder.
 - **Span-level only**: Requires pre-extracted spans. Does not detect boundaries.
 - **128-token context window**: Longer spans are truncated.
+- **Abstraction underperforms**: At 0.472 F1 despite being a large class (117 test spans), suggesting the type may be too broad or overlapping with `analytical_distance` and `literate_feature`.
+- **Precision-recall asymmetry**: Several types show strong precision–recall imbalance (`oral_feature` P=0.784/R=0.365; `formulaic_phrases` P=0.205/R=0.608), indicating the focal loss weighting could be further tuned.
 ## Theoretical Background
 The type level captures functional groupings within the oral–literate framework. Oral types reflect Ong's characterization of oral discourse as additive (`parataxis`), aggregative (`formulaic_phrases`), redundant (`repetition`), agonistically toned (`agonistic_framing`), empathetic and participatory (`direct_address`), and close to the human lifeworld (`concrete_situational`). Literate types capture the analytic (`abstraction`, `subordination`), distanced (`analytical_distance`, `passive_agentless`), and self-referential (`textual_apparatus`) qualities of written discourse.
+## Related Models
+| Model | Task | Classes | F1 |
+|-------|------|---------|-----|
+| [`HavelockAI/bert-marker-category`](https://huggingface.co/HavelockAI/bert-marker-category) | Binary (oral/literate) | 2 | 0.875 |
+| **This model** | Functional type | 18 | 0.573 |
+| [`HavelockAI/bert-marker-subtype`](https://huggingface.co/HavelockAI/bert-marker-subtype) | Fine-grained subtype | 71 | 0.493 |
+| [`HavelockAI/bert-orality-regressor`](https://huggingface.co/HavelockAI/bert-orality-regressor) | Document-level score | Regression | MAE 0.079 |
+| [`HavelockAI/bert-token-classifier`](https://huggingface.co/HavelockAI/bert-token-classifier) | Span detection (BIO) | 145 | 0.500 |
 ## Citation
 ```bibtex
 @misc{havelock2026type,
 ## References
 - Ong, Walter J. *Orality and Literacy: The Technologizing of the Word*. Routledge, 1982.
+- Lee, C. et al. "Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models." ICLR 2020.
+- Warner, A. et al. "Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference." 2024.
 ---
+*Trained: February 2026*