---
license: mit
tags:
- text-classification
- regression
- modernbert
- orality
- linguistics
- rhetorical-analysis
language:
- en
metrics:
- mae
- r2
base_model:
- answerdotai/ModernBERT-base
pipeline_tag: text-classification
library_name: transformers
datasets:
- custom
model-index:
- name: bert-orality-regressor
  results:
  - task:
      type: text-classification
      name: Orality Regression
    metrics:
    - type: mae
      value: 0.0791
      name: Mean Absolute Error
    - type: r2
      value: 0.748
      name: R² Score
---

# Havelock Orality Regressor

ModernBERT-based regression model that scores text on the **oral–literate spectrum** (0–1), grounded in Walter Ong's *Orality and Literacy* (1982).

Given a passage of text, the model outputs a continuous score where higher values indicate greater orality (spoken, performative, additive discourse) and lower values indicate greater literacy (analytic, subordinative, abstract discourse).

## Model Details

| Property | Value |
|----------|-------|
| Base model | `answerdotai/ModernBERT-base` |
| Architecture | `HavelockOralityRegressor` (custom, mean pooling → linear) |
| Task | Single-value regression (MSE loss) |
| Output range | Continuous (not clamped) |
| Max sequence length | 512 tokens |
| Best MAE | **0.0791** |
| R² (at best MAE) | **0.748** |
| Parameters | ~149M |

## Usage
```python
import os
os.environ["TORCH_COMPILE_DISABLE"] = "1"

import warnings
warnings.filterwarnings("ignore", message="Flash Attention 2 only supports")

import torch
from transformers import AutoModel, AutoTokenizer

model_name = "HavelockAI/bert-orality-regressor"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(model_name, trust_remote_code=True)
model.eval()

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

text = "Tell me, O Muse, of that ingenious hero who travelled far and wide"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
inputs = {k: v.to(device) for k, v in inputs.items()}

with torch.no_grad(), torch.autocast(device_type=device.type, enabled=device.type == "cuda"):
    score = model(**inputs).logits.squeeze().item()

print(f"Orality score: {max(0.0, min(1.0, score)):.3f}")
```

### Score Interpretation

| Score | Register |
|-------|----------|
| 0.8–1.0 | Highly oral — epic poetry, sermons, rap, oral storytelling |
| 0.6–0.8 | Oral-dominant — speeches, podcasts, conversational prose |
| 0.4–0.6 | Mixed — journalism, blog posts, dialogue-heavy fiction |
| 0.2–0.4 | Literate-dominant — essays, expository prose |
| 0.0–0.2 | Highly literate — academic papers, legal texts, philosophy |

## Training

### Data

The model was trained on a curated corpus of documents annotated with orality scores using a multi-pass scoring system. Scores were originally on a 0–100 scale and normalized to 0–1 for training. The corpus draws from Project Gutenberg, textfiles.com, Reddit, and Wikipedia talk pages, representing a range of registers from highly oral to highly literate.

An 80/20 train/test split was used (random seed 42).

### Hyperparameters

| Parameter | Value |
|-----------|-------|
| Epochs | 20 |
| Learning rate | 2e-5 |
| Optimizer | AdamW (weight decay 0.01) |
| LR schedule | Cosine with warmup (10% of total steps) |
| Gradient clipping | 1.0 |
| Loss | MSE |
| Mixed precision | FP16 |
| Regularization | Mixout (p=0.1) |

### Training Metrics

<details><summary>Click to show per-epoch metrics</summary>

| Epoch | Loss | MAE | R² |
|-------|------|-----|-----|
| 1 | 0.3496 | 0.1173 | 0.476 |
| 2 | 0.0286 | 0.0992 | 0.593 |
| 3 | 0.0215 | 0.0872 | 0.704 |
| 4 | 0.0144 | 0.0879 | 0.714 |
| 5 | 0.0169 | 0.0865 | 0.712 |
| 6 | 0.0117 | 0.0853 | 0.700 |
| 7 | 0.0096 | 0.0922 | 0.691 |
| 8 | 0.0094 | 0.0850 | 0.722 |
| 9 | 0.0086 | 0.0822 | 0.745 |
| 10 | 0.0064 | 0.0841 | 0.723 |
| 11 | 0.0054 | 0.0921 | 0.682 |
| 12 | 0.0050 | 0.0840 | 0.720 |
| 13 | 0.0044 | 0.0806 | 0.744 |
| 14 | 0.0037 | 0.0805 | 0.740 |
| **15** | **0.0034** | **0.0791** | **0.748** |
| 16 | 0.0033 | 0.0807 | 0.738 |
| 17 | 0.0031 | 0.0803 | 0.742 |
| 18 | 0.0026 | 0.0797 | 0.745 |
| 19 | 0.0027 | 0.0803 | 0.742 |
| 20 | 0.0029 | 0.0805 | 0.741 |

</details>

Best checkpoint selected at epoch 15 by lowest MAE.

## Architecture

Custom `HavelockOralityRegressor` with mean pooling (ModernBERT has no pooler output):
```
ModernBERT (answerdotai/ModernBERT-base)
    └── Mean pooling over non-padded tokens
        └── Dropout (p=0.1)
            └── Linear (hidden_size → 1)
```

### Regularization

- **Mixout** (p=0.1): During training, each backbone weight element has a 10% chance of being replaced by its pretrained value per forward pass, acting as a stochastic L2 anchor that prevents representation drift (Lee et al., 2019)
- **Weight decay** (0.01) via AdamW
- **Gradient clipping** (max norm 1.0)

## Limitations

- **No sigmoid clamping**: The model can output values outside [0, 1]. Consumers should clamp if needed.
- **Domain coverage**: Training corpus skews historical/literary. Performance on modern social media, code-switched text, or non-English text is untested.
- **Document length**: Texts longer than 512 tokens are truncated. The model sees only the first ~400 words, which may not be representative of longer documents.
- **Regression target subjectivity**: Orality scores involve human judgment; inter-annotator agreement bounds the ceiling for model performance.

## Theoretical Background

The oral–literate spectrum follows Ong's framework, which characterizes oral discourse as additive, aggregative, redundant, agonistic, empathetic, and situational, while literate discourse is subordinative, analytic, abstract, distanced, and context-free. The model learns to place text along this continuum from document-level annotations informed by 72 specific rhetorical markers (36 oral, 36 literate).

## Citation
```bibtex
@misc{havelock2026regressor,
  title={Havelock Orality Regressor},
  author={Havelock AI},
  year={2026},
  url={https://huggingface.co/HavelockAI/bert-orality-regressor}
}
```

## References

- Ong, Walter J. *Orality and Literacy: The Technologizing of the Word*. Routledge, 1982.
- Lee, C. et al. "Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models." ICLR 2020.
- Warner, A. et al. "Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference." 2024.

---

*Trained: February 2026*