--- license: mit tags: - text-classification - regression - modernbert - orality - linguistics - rhetorical-analysis language: - en metrics: - mae - r2 base_model: - answerdotai/ModernBERT-base pipeline_tag: text-classification library_name: transformers datasets: - custom model-index: - name: bert-orality-regressor results: - task: type: text-classification name: Orality Regression metrics: - type: mae value: 0.0791 name: Mean Absolute Error - type: r2 value: 0.748 name: R² Score --- # Havelock Orality Regressor ModernBERT-based regression model that scores text on the **oral–literate spectrum** (0–1), grounded in Walter Ong's *Orality and Literacy* (1982). Given a passage of text, the model outputs a continuous score where higher values indicate greater orality (spoken, performative, additive discourse) and lower values indicate greater literacy (analytic, subordinative, abstract discourse). ## Model Details | Property | Value | |----------|-------| | Base model | `answerdotai/ModernBERT-base` | | Architecture | `HavelockOralityRegressor` (custom, mean pooling → linear) | | Task | Single-value regression (MSE loss) | | Output range | Continuous (not clamped) | | Max sequence length | 512 tokens | | Best MAE | **0.0791** | | R² (at best MAE) | **0.748** | | Parameters | ~149M | ## Usage ```python import os os.environ["TORCH_COMPILE_DISABLE"] = "1" import warnings warnings.filterwarnings("ignore", message="Flash Attention 2 only supports") import torch from transformers import AutoModel, AutoTokenizer model_name = "HavelockAI/bert-orality-regressor" tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) model = AutoModel.from_pretrained(model_name, trust_remote_code=True) model.eval() device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model = model.to(device) text = "Tell me, O Muse, of that ingenious hero who travelled far and wide" inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512) inputs = {k: v.to(device) for k, v in inputs.items()} with torch.no_grad(), torch.autocast(device_type=device.type, enabled=device.type == "cuda"): score = model(**inputs).logits.squeeze().item() print(f"Orality score: {max(0.0, min(1.0, score)):.3f}") ``` ### Score Interpretation | Score | Register | |-------|----------| | 0.8–1.0 | Highly oral — epic poetry, sermons, rap, oral storytelling | | 0.6–0.8 | Oral-dominant — speeches, podcasts, conversational prose | | 0.4–0.6 | Mixed — journalism, blog posts, dialogue-heavy fiction | | 0.2–0.4 | Literate-dominant — essays, expository prose | | 0.0–0.2 | Highly literate — academic papers, legal texts, philosophy | ## Training ### Data The model was trained on a curated corpus of documents annotated with orality scores using a multi-pass scoring system. Scores were originally on a 0–100 scale and normalized to 0–1 for training. The corpus draws from Project Gutenberg, textfiles.com, Reddit, and Wikipedia talk pages, representing a range of registers from highly oral to highly literate. An 80/20 train/test split was used (random seed 42). ### Hyperparameters | Parameter | Value | |-----------|-------| | Epochs | 20 | | Learning rate | 2e-5 | | Optimizer | AdamW (weight decay 0.01) | | LR schedule | Cosine with warmup (10% of total steps) | | Gradient clipping | 1.0 | | Loss | MSE | | Mixed precision | FP16 | | Regularization | Mixout (p=0.1) | ### Training Metrics
Click to show per-epoch metrics | Epoch | Loss | MAE | R² | |-------|------|-----|-----| | 1 | 0.3496 | 0.1173 | 0.476 | | 2 | 0.0286 | 0.0992 | 0.593 | | 3 | 0.0215 | 0.0872 | 0.704 | | 4 | 0.0144 | 0.0879 | 0.714 | | 5 | 0.0169 | 0.0865 | 0.712 | | 6 | 0.0117 | 0.0853 | 0.700 | | 7 | 0.0096 | 0.0922 | 0.691 | | 8 | 0.0094 | 0.0850 | 0.722 | | 9 | 0.0086 | 0.0822 | 0.745 | | 10 | 0.0064 | 0.0841 | 0.723 | | 11 | 0.0054 | 0.0921 | 0.682 | | 12 | 0.0050 | 0.0840 | 0.720 | | 13 | 0.0044 | 0.0806 | 0.744 | | 14 | 0.0037 | 0.0805 | 0.740 | | **15** | **0.0034** | **0.0791** | **0.748** | | 16 | 0.0033 | 0.0807 | 0.738 | | 17 | 0.0031 | 0.0803 | 0.742 | | 18 | 0.0026 | 0.0797 | 0.745 | | 19 | 0.0027 | 0.0803 | 0.742 | | 20 | 0.0029 | 0.0805 | 0.741 |
Best checkpoint selected at epoch 15 by lowest MAE. ## Architecture Custom `HavelockOralityRegressor` with mean pooling (ModernBERT has no pooler output): ``` ModernBERT (answerdotai/ModernBERT-base) └── Mean pooling over non-padded tokens └── Dropout (p=0.1) └── Linear (hidden_size → 1) ``` ### Regularization - **Mixout** (p=0.1): During training, each backbone weight element has a 10% chance of being replaced by its pretrained value per forward pass, acting as a stochastic L2 anchor that prevents representation drift (Lee et al., 2019) - **Weight decay** (0.01) via AdamW - **Gradient clipping** (max norm 1.0) ## Limitations - **No sigmoid clamping**: The model can output values outside [0, 1]. Consumers should clamp if needed. - **Domain coverage**: Training corpus skews historical/literary. Performance on modern social media, code-switched text, or non-English text is untested. - **Document length**: Texts longer than 512 tokens are truncated. The model sees only the first ~400 words, which may not be representative of longer documents. - **Regression target subjectivity**: Orality scores involve human judgment; inter-annotator agreement bounds the ceiling for model performance. ## Theoretical Background The oral–literate spectrum follows Ong's framework, which characterizes oral discourse as additive, aggregative, redundant, agonistic, empathetic, and situational, while literate discourse is subordinative, analytic, abstract, distanced, and context-free. The model learns to place text along this continuum from document-level annotations informed by 72 specific rhetorical markers (36 oral, 36 literate). ## Citation ```bibtex @misc{havelock2026regressor, title={Havelock Orality Regressor}, author={Havelock AI}, year={2026}, url={https://huggingface.co/HavelockAI/bert-orality-regressor} } ``` ## References - Ong, Walter J. *Orality and Literacy: The Technologizing of the Word*. Routledge, 1982. - Lee, C. et al. "Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models." ICLR 2020. - Warner, A. et al. "Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference." 2024. --- *Trained: February 2026*