File size: 6,802 Bytes
76a7e72
 
 
 
da248ed
76a7e72
 
 
 
 
 
 
 
 
da248ed
76a7e72
 
 
 
 
 
 
 
 
 
 
 
da248ed
76a7e72
 
da248ed
76a7e72
 
 
 
 
da248ed
76a7e72
 
 
 
 
 
 
da248ed
 
76a7e72
 
 
da248ed
 
23afe9e
da248ed
76a7e72
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
da248ed
76a7e72
 
 
 
 
23afe9e
 
da248ed
23afe9e
 
76a7e72
23afe9e
 
 
76a7e72
da248ed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76a7e72
 
 
 
da248ed
 
76a7e72
da248ed
 
 
76a7e72
 
da248ed
23afe9e
76a7e72
 
23afe9e
76a7e72
 
 
 
 
 
 
 
23afe9e
 
 
 
da248ed
 
 
23afe9e
 
 
76a7e72
 
 
 
 
 
 
 
 
 
 
 
 
da248ed
 
76a7e72
 
 
da248ed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
---
license: mit
tags:
- text-classification
- modernbert
- orality
- linguistics
- rhetorical-analysis
language:
- en
metrics:
- f1
- accuracy
base_model:
- answerdotai/ModernBERT-base
pipeline_tag: text-classification
library_name: transformers
datasets:
- custom
model-index:
- name: bert-marker-category
  results:
  - task:
      type: text-classification
      name: Oral/Literate Span Classification
    metrics:
    - type: f1
      value: 0.804
      name: F1 (macro)
    - type: accuracy
      value: 0.825
      name: Accuracy
---

# Havelock Marker Category Classifier

ModernBERT-based binary classifier that determines whether a rhetorical span is **oral** or **literate**, grounded in Walter Ong's *Orality and Literacy* (1982).

This is the coarsest level of the Havelock span classification hierarchy. Given a text span that has been identified as a rhetorical marker, the model classifies it into one of two categories: oral (characteristic of spoken, performative discourse) or literate (characteristic of written, analytic discourse).

## Model Details

| Property | Value |
|----------|-------|
| Base model | `answerdotai/ModernBERT-base` |
| Architecture | `ModernBertForSequenceClassification` |
| Task | Binary classification |
| Labels | 2 (`oral`, `literate`) |
| Max sequence length | 128 tokens |
| Test F1 (macro) | **0.804** |
| Test Accuracy | **0.825** |
| Missing labels | 0/2 |
| Parameters | ~149M |

## Usage
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "HavelockAI/bert-marker-category"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

span = "Tell me, O Muse, of that ingenious hero"
inputs = tokenizer(span, return_tensors="pt", truncation=True, max_length=128)

with torch.no_grad():
    logits = model(**inputs).logits
    pred = torch.argmax(logits, dim=1).item()

label_map = {0: "oral", 1: "literate"}
print(f"Category: {label_map[pred]}")
```

## Training

### Data

22,367 span-level annotations from the Havelock corpus with marker types normalized against a canonical taxonomy at build time. Spans are drawn from documents sourced from Project Gutenberg, textfiles.com, Reddit, and Wikipedia talk pages. A stratified 80/10/10 train/val/test split was used with swap-based optimization. The test set contains 1,609 spans (1,162 oral, 447 literate).

### Hyperparameters

| Parameter | Value |
|-----------|-------|
| Epochs | 20 |
| Batch size | 16 |
| Learning rate | 2e-5 |
| Optimizer | AdamW (weight decay 0.01) |
| LR schedule | Cosine with 10% warmup |
| Gradient clipping | 1.0 |
| Loss | Focal loss (Ξ³=2.0) + class weights |
| Mixout | 0.1 |
| Mixed precision | FP16 |

### Training Metrics

Best checkpoint selected at epoch 13 by missing-label-primary, F1-tiebreaker (0 missing, F1 0.850).

<details><summary>Click to show per-epoch metrics</summary>

| Epoch | Loss | Val F1 | F1 range |
|-------|------|--------|----------|
| 1 | 0.1231 | 0.815 | 0.786–0.843 |
| 2 | 0.0785 | 0.829 | 0.795–0.863 |
| 3 | 0.0599 | 0.835 | 0.804–0.866 |
| 4 | 0.0457 | 0.816 | 0.788–0.844 |
| 5 | 0.0356 | 0.826 | 0.794–0.857 |
| 6 | 0.0290 | 0.834 | 0.787–0.881 |
| 7 | 0.0235 | 0.836 | 0.802–0.869 |
| 8 | 0.0188 | 0.837 | 0.799–0.876 |
| 9 | 0.0175 | 0.840 | 0.805–0.875 |
| 10 | 0.0162 | 0.839 | 0.802–0.875 |
| 11 | 0.0115 | 0.834 | 0.796–0.872 |
| 12 | 0.0103 | 0.836 | 0.801–0.870 |
| **13** | **0.0097** | **0.850** | **0.812–0.887** |
| 14 | 0.0086 | 0.827 | 0.794–0.861 |
| 15 | 0.0075 | 0.835 | 0.799–0.871 |
| 16 | 0.0074 | 0.828 | 0.794–0.862 |
| 17 | 0.0071 | 0.830 | 0.796–0.863 |
| 18 | 0.0073 | 0.840 | 0.804–0.877 |
| 19 | 0.0068 | 0.843 | 0.806–0.880 |
| 20 | 0.0070 | 0.844 | 0.808–0.880 |

</details>

### Test Set Classification Report
```
              precision    recall  f1-score   support

        oral      0.953     0.798     0.868      1162
    literate      0.631     0.897     0.741       447

    accuracy                          0.825      1609
   macro avg      0.792     0.847     0.804      1609
weighted avg      0.863     0.825     0.833      1609
```

The model achieves high precision on oral spans (0.953) and high recall on literate spans (0.897). The precision gap on literate (0.631) indicates some oral spans are misclassified as literate β€” expected given the class imbalance (72% oral in test).

## Limitations

- **Class imbalance**: The test set is 72% oral / 28% literate, reflecting the corpus distribution. Literate precision suffers as a result.
- **Span-level only**: This model classifies pre-extracted spans. It does not detect span boundaries β€” pair it with a span detection model (e.g., [`HavelockAI/bert-token-classifier`](https://huggingface.co/HavelockAI/bert-token-classifier)) for end-to-end use.
- **128-token context window**: Longer spans are truncated.
- **Domain**: Trained on historical/literary and web text. Performance on other domains is untested.

## Theoretical Background

The oral–literate distinction follows Ong's framework. Oral markers include features like direct address, formulaic phrasing, parataxis, repetition, and sound patterning. Literate markers include features like subordination, abstraction, hedging, passive constructions, and textual apparatus (citations, cross-references). This binary classifier serves as the top level of a three-tier taxonomy: category β†’ type β†’ subtype.

## Related Models

| Model | Task | Classes | F1 |
|-------|------|---------|-----|
| **This model** | Binary (oral/literate) | 2 | 0.804 |
| [`HavelockAI/bert-marker-type`](https://huggingface.co/HavelockAI/bert-marker-type) | Functional type | 18 | 0.573 |
| [`HavelockAI/bert-marker-subtype`](https://huggingface.co/HavelockAI/bert-marker-subtype) | Fine-grained subtype | 71 | 0.493 |
| [`HavelockAI/bert-orality-regressor`](https://huggingface.co/HavelockAI/bert-orality-regressor) | Document-level score | Regression | MAE 0.079 |
| [`HavelockAI/bert-token-classifier`](https://huggingface.co/HavelockAI/bert-token-classifier) | Span detection (BIO) | 145 | 0.500 |

## Citation
```bibtex
@misc{havelock2026category,
  title={Havelock Marker Category Classifier},
  author={Havelock AI},
  year={2026},
  url={https://huggingface.co/HavelockAI/bert-marker-category}
}
```

## References

- Ong, Walter J. *Orality and Literacy: The Technologizing of the Word*. Routledge, 1982.
- Lee, C. et al. "Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models." ICLR 2020.
- Warner, A. et al. "Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference." 2024.

---

*Trained: February 2026*