File size: 6,631 Bytes
732cbf9
 
 
 
 
c8eb1a7
732cbf9
 
 
 
 
 
 
 
 
c8eb1a7
732cbf9
 
 
 
 
 
 
 
 
 
 
 
891acbc
732cbf9
 
891acbc
732cbf9
 
 
 
 
c8eb1a7
732cbf9
 
 
 
 
 
 
c8eb1a7
 
732cbf9
 
 
891acbc
 
c8eb1a7
732cbf9
 
 
891acbc
 
 
 
 
 
732cbf9
891acbc
732cbf9
 
891acbc
 
 
 
 
 
732cbf9
 
 
891acbc
732cbf9
891acbc
732cbf9
 
891acbc
732cbf9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c8eb1a7
732cbf9
c8eb1a7
 
732cbf9
c8eb1a7
 
 
732cbf9
 
 
c8eb1a7
 
732cbf9
 
891acbc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c8eb1a7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
732cbf9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c8eb1a7
 
732cbf9
 
 
c8eb1a7
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
---
license: mit
tags:
- text-classification
- regression
- modernbert
- orality
- linguistics
- rhetorical-analysis
language:
- en
metrics:
- mae
- r2
base_model:
- answerdotai/ModernBERT-base
pipeline_tag: text-classification
library_name: transformers
datasets:
- custom
model-index:
- name: bert-orality-regressor
  results:
  - task:
      type: text-classification
      name: Orality Regression
    metrics:
    - type: mae
      value: 0.0791
      name: Mean Absolute Error
    - type: r2
      value: 0.748
      name:  Score
---

# Havelock Orality Regressor

ModernBERT-based regression model that scores text on the **oral–literate spectrum** (0–1), grounded in Walter Ong's *Orality and Literacy* (1982).

Given a passage of text, the model outputs a continuous score where higher values indicate greater orality (spoken, performative, additive discourse) and lower values indicate greater literacy (analytic, subordinative, abstract discourse).

## Model Details

| Property | Value |
|----------|-------|
| Base model | `answerdotai/ModernBERT-base` |
| Architecture | `HavelockOralityRegressor` (custom, mean pooling → linear) |
| Task | Single-value regression (MSE loss) |
| Output range | Continuous (not clamped) |
| Max sequence length | 512 tokens |
| Best MAE | **0.0791** |
| R² (at best MAE) | **0.748** |
| Parameters | ~149M |

## Usage
```python
import os
os.environ["TORCH_COMPILE_DISABLE"] = "1"

import warnings
warnings.filterwarnings("ignore", message="Flash Attention 2 only supports")

import torch
from transformers import AutoModel, AutoTokenizer

model_name = "HavelockAI/bert-orality-regressor"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(model_name, trust_remote_code=True)
model.eval()

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

text = "Tell me, O Muse, of that ingenious hero who travelled far and wide"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
inputs = {k: v.to(device) for k, v in inputs.items()}

with torch.no_grad(), torch.autocast(device_type=device.type, enabled=device.type == "cuda"):
    score = model(**inputs).logits.squeeze().item()

print(f"Orality score: {max(0.0, min(1.0, score)):.3f}")
```

### Score Interpretation

| Score | Register |
|-------|----------|
| 0.8–1.0 | Highly oral — epic poetry, sermons, rap, oral storytelling |
| 0.6–0.8 | Oral-dominant — speeches, podcasts, conversational prose |
| 0.4–0.6 | Mixed — journalism, blog posts, dialogue-heavy fiction |
| 0.2–0.4 | Literate-dominant — essays, expository prose |
| 0.0–0.2 | Highly literate — academic papers, legal texts, philosophy |

## Training

### Data

The model was trained on a curated corpus of documents annotated with orality scores using a multi-pass scoring system. Scores were originally on a 0–100 scale and normalized to 0–1 for training. The corpus draws from Project Gutenberg, textfiles.com, Reddit, and Wikipedia talk pages, representing a range of registers from highly oral to highly literate.

An 80/20 train/test split was used (random seed 42).

### Hyperparameters

| Parameter | Value |
|-----------|-------|
| Epochs | 20 |
| Learning rate | 2e-5 |
| Optimizer | AdamW (weight decay 0.01) |
| LR schedule | Cosine with warmup (10% of total steps) |
| Gradient clipping | 1.0 |
| Loss | MSE |
| Mixed precision | FP16 |
| Regularization | Mixout (p=0.1) |

### Training Metrics

<details><summary>Click to show per-epoch metrics</summary>

| Epoch | Loss | MAE | R² |
|-------|------|-----|-----|
| 1 | 0.3496 | 0.1173 | 0.476 |
| 2 | 0.0286 | 0.0992 | 0.593 |
| 3 | 0.0215 | 0.0872 | 0.704 |
| 4 | 0.0144 | 0.0879 | 0.714 |
| 5 | 0.0169 | 0.0865 | 0.712 |
| 6 | 0.0117 | 0.0853 | 0.700 |
| 7 | 0.0096 | 0.0922 | 0.691 |
| 8 | 0.0094 | 0.0850 | 0.722 |
| 9 | 0.0086 | 0.0822 | 0.745 |
| 10 | 0.0064 | 0.0841 | 0.723 |
| 11 | 0.0054 | 0.0921 | 0.682 |
| 12 | 0.0050 | 0.0840 | 0.720 |
| 13 | 0.0044 | 0.0806 | 0.744 |
| 14 | 0.0037 | 0.0805 | 0.740 |
| **15** | **0.0034** | **0.0791** | **0.748** |
| 16 | 0.0033 | 0.0807 | 0.738 |
| 17 | 0.0031 | 0.0803 | 0.742 |
| 18 | 0.0026 | 0.0797 | 0.745 |
| 19 | 0.0027 | 0.0803 | 0.742 |
| 20 | 0.0029 | 0.0805 | 0.741 |

</details>

Best checkpoint selected at epoch 15 by lowest MAE.

## Architecture

Custom `HavelockOralityRegressor` with mean pooling (ModernBERT has no pooler output):
```
ModernBERT (answerdotai/ModernBERT-base)
    └── Mean pooling over non-padded tokens
        └── Dropout (p=0.1)
            └── Linear (hidden_size → 1)
```

### Regularization

- **Mixout** (p=0.1): During training, each backbone weight element has a 10% chance of being replaced by its pretrained value per forward pass, acting as a stochastic L2 anchor that prevents representation drift (Lee et al., 2019)
- **Weight decay** (0.01) via AdamW
- **Gradient clipping** (max norm 1.0)

## Limitations

- **No sigmoid clamping**: The model can output values outside [0, 1]. Consumers should clamp if needed.
- **Domain coverage**: Training corpus skews historical/literary. Performance on modern social media, code-switched text, or non-English text is untested.
- **Document length**: Texts longer than 512 tokens are truncated. The model sees only the first ~400 words, which may not be representative of longer documents.
- **Regression target subjectivity**: Orality scores involve human judgment; inter-annotator agreement bounds the ceiling for model performance.

## Theoretical Background

The oral–literate spectrum follows Ong's framework, which characterizes oral discourse as additive, aggregative, redundant, agonistic, empathetic, and situational, while literate discourse is subordinative, analytic, abstract, distanced, and context-free. The model learns to place text along this continuum from document-level annotations informed by 72 specific rhetorical markers (36 oral, 36 literate).

## Citation
```bibtex
@misc{havelock2026regressor,
  title={Havelock Orality Regressor},
  author={Havelock AI},
  year={2026},
  url={https://huggingface.co/HavelockAI/bert-orality-regressor}
}
```

## References

- Ong, Walter J. *Orality and Literacy: The Technologizing of the Word*. Routledge, 1982.
- Lee, C. et al. "Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models." ICLR 2020.
- Warner, A. et al. "Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference." 2024.

---

*Trained: February 2026*