Upload folder using huggingface_hub
Browse files
README.md
CHANGED
|
@@ -3,7 +3,7 @@ license: mit
|
|
| 3 |
tags:
|
| 4 |
- text-classification
|
| 5 |
- regression
|
| 6 |
-
-
|
| 7 |
- orality
|
| 8 |
- linguistics
|
| 9 |
- rhetorical-analysis
|
|
@@ -13,7 +13,7 @@ metrics:
|
|
| 13 |
- mae
|
| 14 |
- r2
|
| 15 |
base_model:
|
| 16 |
-
-
|
| 17 |
pipeline_tag: text-classification
|
| 18 |
library_name: transformers
|
| 19 |
datasets:
|
|
@@ -26,16 +26,16 @@ model-index:
|
|
| 26 |
name: Orality Regression
|
| 27 |
metrics:
|
| 28 |
- type: mae
|
| 29 |
-
value: 0.
|
| 30 |
name: Mean Absolute Error
|
| 31 |
- type: r2
|
| 32 |
-
value: 0.
|
| 33 |
name: R² Score
|
| 34 |
---
|
| 35 |
|
| 36 |
# Havelock Orality Regressor
|
| 37 |
|
| 38 |
-
|
| 39 |
|
| 40 |
Given a passage of text, the model outputs a continuous score where higher values indicate greater orality (spoken, performative, additive discourse) and lower values indicate greater literacy (analytic, subordinative, abstract discourse).
|
| 41 |
|
|
@@ -43,14 +43,14 @@ Given a passage of text, the model outputs a continuous score where higher value
|
|
| 43 |
|
| 44 |
| Property | Value |
|
| 45 |
|----------|-------|
|
| 46 |
-
| Base model | `
|
| 47 |
-
| Architecture | `
|
| 48 |
| Task | Single-value regression (MSE loss) |
|
| 49 |
| Output range | Continuous (not clamped) |
|
| 50 |
| Max sequence length | 512 tokens |
|
| 51 |
-
| Best MAE | **0.
|
| 52 |
-
| R² | **0.
|
| 53 |
-
| Parameters | ~
|
| 54 |
|
| 55 |
## Usage
|
| 56 |
```python
|
|
@@ -92,25 +92,64 @@ An 80/20 train/test split was used (random seed 42).
|
|
| 92 |
|
| 93 |
| Parameter | Value |
|
| 94 |
|-----------|-------|
|
| 95 |
-
| Epochs |
|
| 96 |
-
| Batch size | 8 |
|
| 97 |
| Learning rate | 2e-5 |
|
| 98 |
-
| Optimizer | AdamW |
|
| 99 |
-
| LR schedule |
|
| 100 |
| Gradient clipping | 1.0 |
|
| 101 |
-
| Loss | MSE
|
|
|
|
|
|
|
| 102 |
|
| 103 |
### Training Metrics
|
| 104 |
|
|
|
|
|
|
|
| 105 |
| Epoch | Loss | MAE | R² |
|
| 106 |
|-------|------|-----|-----|
|
| 107 |
-
| 1 | 0.
|
| 108 |
-
| 2 | 0.
|
| 109 |
-
| 3 | 0.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 110 |
|
| 111 |
## Limitations
|
| 112 |
|
| 113 |
-
- **Short training**: Only 3 epochs — likely undertrained. Further epochs or hyperparameter search would probably improve R².
|
| 114 |
- **No sigmoid clamping**: The model can output values outside [0, 1]. Consumers should clamp if needed.
|
| 115 |
- **Domain coverage**: Training corpus skews historical/literary. Performance on modern social media, code-switched text, or non-English text is untested.
|
| 116 |
- **Document length**: Texts longer than 512 tokens are truncated. The model sees only the first ~400 words, which may not be representative of longer documents.
|
|
@@ -133,7 +172,9 @@ The oral–literate spectrum follows Ong's framework, which characterizes oral d
|
|
| 133 |
## References
|
| 134 |
|
| 135 |
- Ong, Walter J. *Orality and Literacy: The Technologizing of the Word*. Routledge, 1982.
|
|
|
|
|
|
|
| 136 |
|
| 137 |
---
|
| 138 |
|
| 139 |
-
*
|
|
|
|
| 3 |
tags:
|
| 4 |
- text-classification
|
| 5 |
- regression
|
| 6 |
+
- modernbert
|
| 7 |
- orality
|
| 8 |
- linguistics
|
| 9 |
- rhetorical-analysis
|
|
|
|
| 13 |
- mae
|
| 14 |
- r2
|
| 15 |
base_model:
|
| 16 |
+
- answerdotai/ModernBERT-base
|
| 17 |
pipeline_tag: text-classification
|
| 18 |
library_name: transformers
|
| 19 |
datasets:
|
|
|
|
| 26 |
name: Orality Regression
|
| 27 |
metrics:
|
| 28 |
- type: mae
|
| 29 |
+
value: 0.0819
|
| 30 |
name: Mean Absolute Error
|
| 31 |
- type: r2
|
| 32 |
+
value: 0.734
|
| 33 |
name: R² Score
|
| 34 |
---
|
| 35 |
|
| 36 |
# Havelock Orality Regressor
|
| 37 |
|
| 38 |
+
ModernBERT-based regression model that scores text on the **oral–literate spectrum** (0–1), grounded in Walter Ong's *Orality and Literacy* (1982).
|
| 39 |
|
| 40 |
Given a passage of text, the model outputs a continuous score where higher values indicate greater orality (spoken, performative, additive discourse) and lower values indicate greater literacy (analytic, subordinative, abstract discourse).
|
| 41 |
|
|
|
|
| 43 |
|
| 44 |
| Property | Value |
|
| 45 |
|----------|-------|
|
| 46 |
+
| Base model | `answerdotai/ModernBERT-base` |
|
| 47 |
+
| Architecture | `HavelockOralityRegressor` (custom, mean pooling → linear) |
|
| 48 |
| Task | Single-value regression (MSE loss) |
|
| 49 |
| Output range | Continuous (not clamped) |
|
| 50 |
| Max sequence length | 512 tokens |
|
| 51 |
+
| Best MAE | **0.0819** |
|
| 52 |
+
| R² (at best MAE) | **0.734** |
|
| 53 |
+
| Parameters | ~149M |
|
| 54 |
|
| 55 |
## Usage
|
| 56 |
```python
|
|
|
|
| 92 |
|
| 93 |
| Parameter | Value |
|
| 94 |
|-----------|-------|
|
| 95 |
+
| Epochs | 20 |
|
|
|
|
| 96 |
| Learning rate | 2e-5 |
|
| 97 |
+
| Optimizer | AdamW (weight decay 0.01) |
|
| 98 |
+
| LR schedule | Cosine with warmup (10% of total steps) |
|
| 99 |
| Gradient clipping | 1.0 |
|
| 100 |
+
| Loss | MSE |
|
| 101 |
+
| Mixed precision | FP16 |
|
| 102 |
+
| Regularization | Mixout (p=0.1) |
|
| 103 |
|
| 104 |
### Training Metrics
|
| 105 |
|
| 106 |
+
<details><summary>Click to show per-epoch metrics</summary>
|
| 107 |
+
|
| 108 |
| Epoch | Loss | MAE | R² |
|
| 109 |
|-------|------|-----|-----|
|
| 110 |
+
| 1 | 0.3485 | 0.1151 | 0.485 |
|
| 111 |
+
| 2 | 0.0269 | 0.1145 | 0.446 |
|
| 112 |
+
| 3 | 0.0235 | 0.0962 | 0.636 |
|
| 113 |
+
| 4 | 0.0162 | 0.0937 | 0.648 |
|
| 114 |
+
| 5 | 0.0228 | 0.1099 | 0.566 |
|
| 115 |
+
| 6 | 0.0153 | 0.0971 | 0.605 |
|
| 116 |
+
| 7 | 0.0115 | 0.0883 | 0.707 |
|
| 117 |
+
| 8 | 0.0112 | 0.0906 | 0.681 |
|
| 118 |
+
| 9 | 0.0095 | 0.0872 | 0.713 |
|
| 119 |
+
| 10 | 0.0076 | 0.0898 | 0.691 |
|
| 120 |
+
| 11 | 0.0060 | 0.0840 | 0.727 |
|
| 121 |
+
| 12 | 0.0054 | 0.0850 | 0.715 |
|
| 122 |
+
| 13 | 0.0050 | 0.0821 | 0.738 |
|
| 123 |
+
| 14 | 0.0043 | 0.0820 | 0.737 |
|
| 124 |
+
| **15** | **0.0040** | **0.0819** | **0.734** |
|
| 125 |
+
| 16 | 0.0041 | 0.0891 | 0.689 |
|
| 126 |
+
| 17 | 0.0035 | 0.0829 | 0.727 |
|
| 127 |
+
| 18 | 0.0031 | 0.0825 | 0.729 |
|
| 128 |
+
| 19 | 0.0032 | 0.0831 | 0.725 |
|
| 129 |
+
| 20 | 0.0033 | 0.0833 | 0.724 |
|
| 130 |
+
|
| 131 |
+
</details>
|
| 132 |
+
|
| 133 |
+
Best checkpoint selected at epoch 15 by lowest MAE.
|
| 134 |
+
|
| 135 |
+
## Architecture
|
| 136 |
+
|
| 137 |
+
Custom `HavelockOralityRegressor` with mean pooling (ModernBERT has no pooler output):
|
| 138 |
+
```
|
| 139 |
+
ModernBERT (answerdotai/ModernBERT-base)
|
| 140 |
+
└── Mean pooling over non-padded tokens
|
| 141 |
+
└── Dropout (p=0.1)
|
| 142 |
+
└── Linear (hidden_size → 1)
|
| 143 |
+
```
|
| 144 |
+
|
| 145 |
+
### Regularization
|
| 146 |
+
|
| 147 |
+
- **Mixout** (p=0.1): During training, each backbone weight element has a 10% chance of being replaced by its pretrained value per forward pass, acting as a stochastic L2 anchor that prevents representation drift (Lee et al., 2019)
|
| 148 |
+
- **Weight decay** (0.01) via AdamW
|
| 149 |
+
- **Gradient clipping** (max norm 1.0)
|
| 150 |
|
| 151 |
## Limitations
|
| 152 |
|
|
|
|
| 153 |
- **No sigmoid clamping**: The model can output values outside [0, 1]. Consumers should clamp if needed.
|
| 154 |
- **Domain coverage**: Training corpus skews historical/literary. Performance on modern social media, code-switched text, or non-English text is untested.
|
| 155 |
- **Document length**: Texts longer than 512 tokens are truncated. The model sees only the first ~400 words, which may not be representative of longer documents.
|
|
|
|
| 172 |
## References
|
| 173 |
|
| 174 |
- Ong, Walter J. *Orality and Literacy: The Technologizing of the Word*. Routledge, 1982.
|
| 175 |
+
- Lee, C. et al. "Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models." ICLR 2020.
|
| 176 |
+
- Warner, A. et al. "Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference." 2024.
|
| 177 |
|
| 178 |
---
|
| 179 |
|
| 180 |
+
*Trained: February 2026*
|