--- license: mit tags: - text-classification - modernbert - orality - linguistics - rhetorical-analysis language: - en metrics: - f1 - accuracy base_model: - answerdotai/ModernBERT-base pipeline_tag: text-classification library_name: transformers datasets: - custom model-index: - name: bert-marker-type results: - task: type: text-classification name: Marker Type Classification metrics: - type: f1 value: 0.573 name: F1 (macro) - type: accuracy value: 0.584 name: Accuracy --- # Havelock Marker Type Classifier ModernBERT-based classifier for **18 rhetorical marker types** on the oral–literate spectrum, grounded in Walter Ong's *Orality and Literacy* (1982). This is the mid-level of the Havelock span classification hierarchy. Given a text span identified as a rhetorical marker, the model classifies it into one of 18 functional types (e.g., `repetition`, `subordination`, `direct_address`, `hedging_qualification`). ## Model Details | Property | Value | |----------|-------| | Base model | `answerdotai/ModernBERT-base` | | Architecture | `ModernBertForSequenceClassification` | | Task | Multi-class classification (18 classes) | | Max sequence length | 128 tokens | | Test F1 (macro) | **0.573** | | Test Accuracy | **0.584** | | Missing labels | **0/18** | | Parameters | ~149M | ## Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch model_name = "HavelockAI/bert-marker-type" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) span = "whether or not the underlying assumptions hold true" inputs = tokenizer(span, return_tensors="pt", truncation=True, max_length=128) with torch.no_grad(): logits = model(**inputs).logits pred = torch.argmax(logits, dim=1).item() print(f"Marker type: {model.config.id2label[pred]}") ``` ## Label Taxonomy (18 types) The 18 types group fine-grained subtypes into functional families. Prior versions carried spurious label variants (e.g., `hedging` alongside `hedging_qualification`, `passive` alongside `passive_agentless`) introduced by inconsistent upstream annotation. These have been resolved via a canonical taxonomy with normalization and validation at build time. | Oral Types (10) | Literate Types (8) | |------------|----------------| | `direct_address` | `subordination` | | `repetition` | `abstraction` | | `formulaic_phrases` | `hedging_qualification` | | `parallelism` | `analytical_distance` | | `parataxis` | `logical_connectives` | | `sound_patterns` | `textual_apparatus` | | `performance_markers` | `literate_feature` | | `concrete_situational` | `passive_agentless` | | `agonistic_framing` | | | `oral_feature` | | ## Training ### Data 22,367 span-level annotations from the Havelock corpus. Each span carries a `marker_type` field normalized against a canonical taxonomy at build time. A stratified 80/10/10 train/val/test split was used with swap-based optimization to balance label distributions across splits. The test set contains 2,178 spans. ### Hyperparameters | Parameter | Value | |-----------|-------| | Epochs | 20 | | Batch size | 16 | | Learning rate | 3e-5 | | Optimizer | AdamW (weight decay 0.01) | | LR schedule | Cosine with 10% warmup | | Gradient clipping | 1.0 | | Loss | Focal loss (γ=2.0) + class weights | | Label smoothing | 0.0 | | Mixout | 0.1 | | Mixed precision | FP16 | | Min examples per class | 50 | ### Training Metrics Best checkpoint selected at epoch 15 by missing-label-primary, F1-tiebreaker (0 missing, F1 0.590). ### Test Set Classification Report

Click to expand per-class precision/recall/F1/support

``` precision recall f1-score support abstraction 0.368 0.658 0.472 117 agonistic_framing 0.857 0.750 0.800 32 analytical_distance 0.504 0.475 0.489 120 concrete_situational 0.509 0.385 0.438 143 direct_address 0.671 0.689 0.680 367 formulaic_phrases 0.205 0.608 0.307 51 hedging_qualification 0.600 0.500 0.545 114 literate_feature 0.478 0.833 0.608 66 logical_connectives 0.621 0.516 0.564 124 oral_feature 0.784 0.365 0.498 159 parallelism 0.688 0.579 0.629 19 parataxis 0.655 0.387 0.486 93 passive_agentless 0.721 0.500 0.590 62 performance_markers 0.660 0.403 0.500 77 repetition 0.738 0.705 0.721 156 sound_patterns 0.672 0.623 0.647 69 subordination 0.622 0.689 0.654 296 textual_apparatus 0.718 0.655 0.685 113 accuracy 0.584 2178 macro avg 0.615 0.573 0.573 2178 weighted avg 0.624 0.584 0.587 2178 ```

**Top performing types (F1 ≥ 0.65):** `agonistic_framing` (0.800), `repetition` (0.721), `textual_apparatus` (0.685), `direct_address` (0.680), `subordination` (0.654), `sound_patterns` (0.647), `parallelism` (0.629), `literate_feature` (0.608). **Weakest types (F1 < 0.50):** `formulaic_phrases` (0.307), `concrete_situational` (0.438), `abstraction` (0.472), `parataxis` (0.486), `oral_feature` (0.498). `formulaic_phrases` suffers from severe precision collapse (P=0.205) despite reasonable recall, suggesting heavy confusion with other oral types. `oral_feature` shows the inverse pattern (P=0.784, R=0.365) — the model is confident but conservative. ## Class Distribution | Support Range | Classes | Examples | |---------------|---------|----------| | >2500 | `direct_address`, `subordination`, `abstraction` | 3 | | 1000–2500 | `repetition`, `formulaic_phrases`, `hedging_qualification`, `analytical_distance`, `concrete_situational`, `logical_connectives`, `textual_apparatus` | 7 | | 500–1000 | `sound_patterns`, `passive_agentless`, `performance_markers`, `parataxis`, `literate_feature`, `oral_feature` | 6 | | <500 | `agonistic_framing`, `parallelism` | 2 | ## Limitations - **Class imbalance**: `direct_address` has 367 test examples while `parallelism` has 19. Weighted F1 (0.587) is close to macro F1 (0.573), indicating reasonably balanced performance, but rare types remain harder. - **Span-level only**: Requires pre-extracted spans. Does not detect boundaries. - **128-token context window**: Longer spans are truncated. - **Abstraction underperforms**: At 0.472 F1 despite being a large class (117 test spans), suggesting the type may be too broad or overlapping with `analytical_distance` and `literate_feature`. - **Precision-recall asymmetry**: Several types show strong precision–recall imbalance (`oral_feature` P=0.784/R=0.365; `formulaic_phrases` P=0.205/R=0.608), indicating the focal loss weighting could be further tuned. ## Theoretical Background The type level captures functional groupings within the oral–literate framework. Oral types reflect Ong's characterization of oral discourse as additive (`parataxis`), aggregative (`formulaic_phrases`), redundant (`repetition`), agonistically toned (`agonistic_framing`), empathetic and participatory (`direct_address`), and close to the human lifeworld (`concrete_situational`). Literate types capture the analytic (`abstraction`, `subordination`), distanced (`analytical_distance`, `passive_agentless`), and self-referential (`textual_apparatus`) qualities of written discourse. ## Related Models | Model | Task | Classes | F1 | |-------|------|---------|-----| | [`HavelockAI/bert-marker-category`](https://huggingface.co/HavelockAI/bert-marker-category) | Binary (oral/literate) | 2 | 0.875 | | **This model** | Functional type | 18 | 0.573 | | [`HavelockAI/bert-marker-subtype`](https://huggingface.co/HavelockAI/bert-marker-subtype) | Fine-grained subtype | 71 | 0.493 | | [`HavelockAI/bert-orality-regressor`](https://huggingface.co/HavelockAI/bert-orality-regressor) | Document-level score | Regression | MAE 0.079 | | [`HavelockAI/bert-token-classifier`](https://huggingface.co/HavelockAI/bert-token-classifier) | Span detection (BIO) | 145 | 0.500 | ## Citation ```bibtex @misc{havelock2026type, title={Havelock Marker Type Classifier}, author={Havelock AI}, year={2026}, url={https://huggingface.co/HavelockAI/bert-marker-type} } ``` ## References - Ong, Walter J. *Orality and Literacy: The Technologizing of the Word*. Routledge, 1982. - Lee, C. et al. "Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models." ICLR 2020. - Warner, A. et al. "Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference." 2024. --- *Trained: February 2026*