File size: 6,631 Bytes

732cbf9
 
 
 
 
c8eb1a7
732cbf9
 
 
 
 
 
 
 
 
c8eb1a7
732cbf9
 
 
 
 
 
 
 
 
 
 
 
891acbc
732cbf9
 
891acbc
732cbf9
 
 
 
 
c8eb1a7
732cbf9
 
 
 
 
 
 
c8eb1a7
 
732cbf9
 
 
891acbc
 
c8eb1a7
732cbf9
 
 
891acbc
 
 
 
 
 
732cbf9
891acbc
732cbf9
 
891acbc
 
 
 
 
 
732cbf9
 
 
891acbc
732cbf9
891acbc
732cbf9
 
891acbc
732cbf9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c8eb1a7
732cbf9
c8eb1a7
 
732cbf9
c8eb1a7
 
 
732cbf9
 
 
c8eb1a7
 
732cbf9
 
891acbc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c8eb1a7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
732cbf9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c8eb1a7
 
732cbf9
 
 
c8eb1a7

---
license: mit
tags:
- text-classification
- regression
- modernbert
- orality
- linguistics
- rhetorical-analysis
language:
- en
metrics:
- mae
- r2
base_model:
- answerdotai/ModernBERT-base
pipeline_tag: text-classification
library_name: transformers
datasets:
- custom
model-index:
- name: bert-orality-regressor
  results:
  - task:
      type: text-classification
      name: Orality Regression
    metrics:
    - type: mae
      value: 0.0791
      name: Mean Absolute Error
    - type: r2
      value: 0.748
      name: R² Score
---

# Havelock Orality Regressor

ModernBERT-based regression model that scores text on the **oral–literate spectrum** (0–1), grounded in Walter Ong's *Orality and Literacy* (1982).

Given a passage of text, the model outputs a continuous score where higher values indicate greater orality (spoken, performative, additive discourse) and lower values indicate greater literacy (analytic, subordinative, abstract discourse).

## Model Details

| Property | Value |
|----------|-------|
| Base model | `answerdotai/ModernBERT-base` |
| Architecture | `HavelockOralityRegressor` (custom, mean pooling → linear) |
| Task | Single-value regression (MSE loss) |
| Output range | Continuous (not clamped) |
| Max sequence length | 512 tokens |
| Best MAE | **0.0791** |
| R² (at best MAE) | **0.748** |
| Parameters | ~149M |

## Usage
```python
import os
os.environ["TORCH_COMPILE_DISABLE"] = "1"

import warnings
warnings.filterwarnings("ignore", message="Flash Attention 2 only supports")

import torch
from transformers import AutoModel, AutoTokenizer

model_name = "HavelockAI/bert-orality-regressor"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(model_name, trust_remote_code=True)
model.eval()

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

text = "Tell me, O Muse, of that ingenious hero who travelled far and wide"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
inputs = {k: v.to(device) for k, v in inputs.items()}

with torch.no_grad(), torch.autocast(device_type=device.type, enabled=device.type == "cuda"):
    score = model(**inputs).logits.squeeze().item()

print(f"Orality score: {max(0.0, min(1.0, score)):.3f}")
```

### Score Interpretation

| Score | Register |
|-------|----------|
| 0.8–1.0 | Highly oral — epic poetry, sermons, rap, oral storytelling |
| 0.6–0.8 | Oral-dominant — speeches, podcasts, conversational prose |
| 0.4–0.6 | Mixed — journalism, blog posts, dialogue-heavy fiction |
| 0.2–0.4 | Literate-dominant — essays, expository prose |
| 0.0–0.2 | Highly literate — academic papers, legal texts, philosophy |

## Training

### Data

The model was trained on a curated corpus of documents annotated with orality scores using a multi-pass scoring system. Scores were originally on a 0–100 scale and normalized to 0–1 for training. The corpus draws from Project Gutenberg, textfiles.com, Reddit, and Wikipedia talk pages, representing a range of registers from highly oral to highly literate.

An 80/20 train/test split was used (random seed 42).

### Hyperparameters

| Parameter | Value |
|-----------|-------|
| Epochs | 20 |
| Learning rate | 2e-5 |
| Optimizer | AdamW (weight decay 0.01) |
| LR schedule | Cosine with warmup (10% of total steps) |
| Gradient clipping | 1.0 |
| Loss | MSE |
| Mixed precision | FP16 |
| Regularization | Mixout (p=0.1) |

### Training Metrics

<details><summary>Click to show per-epoch metrics</summary>

| Epoch | Loss | MAE | R² |
|-------|------|-----|-----|
| 1 | 0.3496 | 0.1173 | 0.476 |
| 2 | 0.0286 | 0.0992 | 0.593 |
| 3 | 0.0215 | 0.0872 | 0.704 |
| 4 | 0.0144 | 0.0879 | 0.714 |
| 5 | 0.0169 | 0.0865 | 0.712 |
| 6 | 0.0117 | 0.0853 | 0.700 |
| 7 | 0.0096 | 0.0922 | 0.691 |
| 8 | 0.0094 | 0.0850 | 0.722 |
| 9 | 0.0086 | 0.0822 | 0.745 |
| 10 | 0.0064 | 0.0841 | 0.723 |
| 11 | 0.0054 | 0.0921 | 0.682 |
| 12 | 0.0050 | 0.0840 | 0.720 |
| 13 | 0.0044 | 0.0806 | 0.744 |
| 14 | 0.0037 | 0.0805 | 0.740 |
| **15** | **0.0034** | **0.0791** | **0.748** |
| 16 | 0.0033 | 0.0807 | 0.738 |
| 17 | 0.0031 | 0.0803 | 0.742 |
| 18 | 0.0026 | 0.0797 | 0.745 |
| 19 | 0.0027 | 0.0803 | 0.742 |
| 20 | 0.0029 | 0.0805 | 0.741 |

</details>

Best checkpoint selected at epoch 15 by lowest MAE.

## Architecture

Custom `HavelockOralityRegressor` with mean pooling (ModernBERT has no pooler output):
```
ModernBERT (answerdotai/ModernBERT-base)
    └── Mean pooling over non-padded tokens
        └── Dropout (p=0.1)
            └── Linear (hidden_size → 1)
```

### Regularization

- **Mixout** (p=0.1): During training, each backbone weight element has a 10% chance of being replaced by its pretrained value per forward pass, acting as a stochastic L2 anchor that prevents representation drift (Lee et al., 2019)
- **Weight decay** (0.01) via AdamW
- **Gradient clipping** (max norm 1.0)

## Limitations

- **No sigmoid clamping**: The model can output values outside [0, 1]. Consumers should clamp if needed.
- **Domain coverage**: Training corpus skews historical/literary. Performance on modern social media, code-switched text, or non-English text is untested.
- **Document length**: Texts longer than 512 tokens are truncated. The model sees only the first ~400 words, which may not be representative of longer documents.
- **Regression target subjectivity**: Orality scores involve human judgment; inter-annotator agreement bounds the ceiling for model performance.

## Theoretical Background

The oral–literate spectrum follows Ong's framework, which characterizes oral discourse as additive, aggregative, redundant, agonistic, empathetic, and situational, while literate discourse is subordinative, analytic, abstract, distanced, and context-free. The model learns to place text along this continuum from document-level annotations informed by 72 specific rhetorical markers (36 oral, 36 literate).

## Citation
```bibtex
@misc{havelock2026regressor,
  title={Havelock Orality Regressor},
  author={Havelock AI},
  year={2026},
  url={https://huggingface.co/HavelockAI/bert-orality-regressor}
}
```

## References

- Ong, Walter J. *Orality and Literacy: The Technologizing of the Word*. Routledge, 1982.
- Lee, C. et al. "Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models." ICLR 2020.
- Warner, A. et al. "Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference." 2024.

---

*Trained: February 2026*