permutans commited on
Commit
c8eb1a7
·
verified ·
1 Parent(s): 880f21a

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +61 -20
README.md CHANGED
@@ -3,7 +3,7 @@ license: mit
3
  tags:
4
  - text-classification
5
  - regression
6
- - bert
7
  - orality
8
  - linguistics
9
  - rhetorical-analysis
@@ -13,7 +13,7 @@ metrics:
13
  - mae
14
  - r2
15
  base_model:
16
- - google-bert/bert-base-uncased
17
  pipeline_tag: text-classification
18
  library_name: transformers
19
  datasets:
@@ -26,16 +26,16 @@ model-index:
26
  name: Orality Regression
27
  metrics:
28
  - type: mae
29
- value: 0.0786
30
  name: Mean Absolute Error
31
  - type: r2
32
- value: 0.756
33
  name: R² Score
34
  ---
35
 
36
  # Havelock Orality Regressor
37
 
38
- BERT-based regression model that scores text on the **oral–literate spectrum** (0–1), grounded in Walter Ong's *Orality and Literacy* (1982).
39
 
40
  Given a passage of text, the model outputs a continuous score where higher values indicate greater orality (spoken, performative, additive discourse) and lower values indicate greater literacy (analytic, subordinative, abstract discourse).
41
 
@@ -43,14 +43,14 @@ Given a passage of text, the model outputs a continuous score where higher value
43
 
44
  | Property | Value |
45
  |----------|-------|
46
- | Base model | `bert-base-uncased` |
47
- | Architecture | `BertForSequenceClassification` (num_labels=1) |
48
  | Task | Single-value regression (MSE loss) |
49
  | Output range | Continuous (not clamped) |
50
  | Max sequence length | 512 tokens |
51
- | Best MAE | **0.0786** |
52
- | R² | **0.756** |
53
- | Parameters | ~109M |
54
 
55
  ## Usage
56
  ```python
@@ -92,25 +92,64 @@ An 80/20 train/test split was used (random seed 42).
92
 
93
  | Parameter | Value |
94
  |-----------|-------|
95
- | Epochs | 3 |
96
- | Batch size | 8 |
97
  | Learning rate | 2e-5 |
98
- | Optimizer | AdamW |
99
- | LR schedule | Linear warmup (10% of total steps) |
100
  | Gradient clipping | 1.0 |
101
- | Loss | MSE (via HF `num_labels=1`) |
 
 
102
 
103
  ### Training Metrics
104
 
 
 
105
  | Epoch | Loss | MAE | R² |
106
  |-------|------|-----|-----|
107
- | 1 | 0.0382 | 0.1443 | 0.317 |
108
- | 2 | 0.0187 | 0.0852 | 0.722 |
109
- | 3 | 0.0128 | 0.0786 | 0.756 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
110
 
111
  ## Limitations
112
 
113
- - **Short training**: Only 3 epochs — likely undertrained. Further epochs or hyperparameter search would probably improve R².
114
  - **No sigmoid clamping**: The model can output values outside [0, 1]. Consumers should clamp if needed.
115
  - **Domain coverage**: Training corpus skews historical/literary. Performance on modern social media, code-switched text, or non-English text is untested.
116
  - **Document length**: Texts longer than 512 tokens are truncated. The model sees only the first ~400 words, which may not be representative of longer documents.
@@ -133,7 +172,9 @@ The oral–literate spectrum follows Ong's framework, which characterizes oral d
133
  ## References
134
 
135
  - Ong, Walter J. *Orality and Literacy: The Technologizing of the Word*. Routledge, 1982.
 
 
136
 
137
  ---
138
 
139
- *Model version: 33b6eccc · Trained: February 2026*
 
3
  tags:
4
  - text-classification
5
  - regression
6
+ - modernbert
7
  - orality
8
  - linguistics
9
  - rhetorical-analysis
 
13
  - mae
14
  - r2
15
  base_model:
16
+ - answerdotai/ModernBERT-base
17
  pipeline_tag: text-classification
18
  library_name: transformers
19
  datasets:
 
26
  name: Orality Regression
27
  metrics:
28
  - type: mae
29
+ value: 0.0819
30
  name: Mean Absolute Error
31
  - type: r2
32
+ value: 0.734
33
  name: R² Score
34
  ---
35
 
36
  # Havelock Orality Regressor
37
 
38
+ ModernBERT-based regression model that scores text on the **oral–literate spectrum** (0–1), grounded in Walter Ong's *Orality and Literacy* (1982).
39
 
40
  Given a passage of text, the model outputs a continuous score where higher values indicate greater orality (spoken, performative, additive discourse) and lower values indicate greater literacy (analytic, subordinative, abstract discourse).
41
 
 
43
 
44
  | Property | Value |
45
  |----------|-------|
46
+ | Base model | `answerdotai/ModernBERT-base` |
47
+ | Architecture | `HavelockOralityRegressor` (custom, mean pooling → linear) |
48
  | Task | Single-value regression (MSE loss) |
49
  | Output range | Continuous (not clamped) |
50
  | Max sequence length | 512 tokens |
51
+ | Best MAE | **0.0819** |
52
+ | R² (at best MAE) | **0.734** |
53
+ | Parameters | ~149M |
54
 
55
  ## Usage
56
  ```python
 
92
 
93
  | Parameter | Value |
94
  |-----------|-------|
95
+ | Epochs | 20 |
 
96
  | Learning rate | 2e-5 |
97
+ | Optimizer | AdamW (weight decay 0.01) |
98
+ | LR schedule | Cosine with warmup (10% of total steps) |
99
  | Gradient clipping | 1.0 |
100
+ | Loss | MSE |
101
+ | Mixed precision | FP16 |
102
+ | Regularization | Mixout (p=0.1) |
103
 
104
  ### Training Metrics
105
 
106
+ <details><summary>Click to show per-epoch metrics</summary>
107
+
108
  | Epoch | Loss | MAE | R² |
109
  |-------|------|-----|-----|
110
+ | 1 | 0.3485 | 0.1151 | 0.485 |
111
+ | 2 | 0.0269 | 0.1145 | 0.446 |
112
+ | 3 | 0.0235 | 0.0962 | 0.636 |
113
+ | 4 | 0.0162 | 0.0937 | 0.648 |
114
+ | 5 | 0.0228 | 0.1099 | 0.566 |
115
+ | 6 | 0.0153 | 0.0971 | 0.605 |
116
+ | 7 | 0.0115 | 0.0883 | 0.707 |
117
+ | 8 | 0.0112 | 0.0906 | 0.681 |
118
+ | 9 | 0.0095 | 0.0872 | 0.713 |
119
+ | 10 | 0.0076 | 0.0898 | 0.691 |
120
+ | 11 | 0.0060 | 0.0840 | 0.727 |
121
+ | 12 | 0.0054 | 0.0850 | 0.715 |
122
+ | 13 | 0.0050 | 0.0821 | 0.738 |
123
+ | 14 | 0.0043 | 0.0820 | 0.737 |
124
+ | **15** | **0.0040** | **0.0819** | **0.734** |
125
+ | 16 | 0.0041 | 0.0891 | 0.689 |
126
+ | 17 | 0.0035 | 0.0829 | 0.727 |
127
+ | 18 | 0.0031 | 0.0825 | 0.729 |
128
+ | 19 | 0.0032 | 0.0831 | 0.725 |
129
+ | 20 | 0.0033 | 0.0833 | 0.724 |
130
+
131
+ </details>
132
+
133
+ Best checkpoint selected at epoch 15 by lowest MAE.
134
+
135
+ ## Architecture
136
+
137
+ Custom `HavelockOralityRegressor` with mean pooling (ModernBERT has no pooler output):
138
+ ```
139
+ ModernBERT (answerdotai/ModernBERT-base)
140
+ └── Mean pooling over non-padded tokens
141
+ └── Dropout (p=0.1)
142
+ └── Linear (hidden_size → 1)
143
+ ```
144
+
145
+ ### Regularization
146
+
147
+ - **Mixout** (p=0.1): During training, each backbone weight element has a 10% chance of being replaced by its pretrained value per forward pass, acting as a stochastic L2 anchor that prevents representation drift (Lee et al., 2019)
148
+ - **Weight decay** (0.01) via AdamW
149
+ - **Gradient clipping** (max norm 1.0)
150
 
151
  ## Limitations
152
 
 
153
  - **No sigmoid clamping**: The model can output values outside [0, 1]. Consumers should clamp if needed.
154
  - **Domain coverage**: Training corpus skews historical/literary. Performance on modern social media, code-switched text, or non-English text is untested.
155
  - **Document length**: Texts longer than 512 tokens are truncated. The model sees only the first ~400 words, which may not be representative of longer documents.
 
172
  ## References
173
 
174
  - Ong, Walter J. *Orality and Literacy: The Technologizing of the Word*. Routledge, 1982.
175
+ - Lee, C. et al. "Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models." ICLR 2020.
176
+ - Warner, A. et al. "Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference." 2024.
177
 
178
  ---
179
 
180
+ *Trained: February 2026*