Upload folder using huggingface_hub

Browse files

Files changed (5) hide show

README.md +55 -22
config.json +66 -18
model.safetensors +2 -2
tokenizer.json +0 -0
tokenizer_config.json +7 -5

README.md CHANGED Viewed

@@ -2,7 +2,7 @@
 license: mit
 tags:
 - text-classification
-- bert
 - orality
 - linguistics
 - rhetorical-analysis
@@ -12,7 +12,7 @@ metrics:
 - f1
 - accuracy
 base_model:
-- google-bert/bert-base-uncased
 pipeline_tag: text-classification
 library_name: transformers
 datasets:
@@ -25,16 +25,16 @@ model-index:
       name: Oral/Literate Span Classification
     metrics:
     - type: f1
-      value: 0.835
       name: F1 (macro)
     - type: accuracy
-      value: 0.858
       name: Accuracy
 ---
 # Havelock Marker Category Classifier
-BERT-based binary classifier that determines whether a rhetorical span is **oral** or **literate**, grounded in Walter Ong's *Orality and Literacy* (1982).
 This is the coarsest level of the Havelock span classification hierarchy. Given a text span that has been identified as a rhetorical marker, the model classifies it into one of two categories: oral (characteristic of spoken, performative discourse) or literate (characteristic of written, analytic discourse).
@@ -42,15 +42,15 @@ This is the coarsest level of the Havelock span classification hierarchy. Given
 | Property | Value |
 |----------|-------|
-| Base model | `bert-base-uncased` |
-| Architecture | `BertForSequenceClassification` |
 | Task | Binary classification |
 | Labels | 2 (`oral`, `literate`) |
 | Max sequence length | 128 tokens |
-| Test F1 (macro) | **0.835** |
-| Test Accuracy | **0.858** |
 | Missing labels | 0/2 |
-| Parameters | ~109M |
 ## Usage
 ```python
@@ -76,7 +76,7 @@ print(f"Category: {label_map[pred]}")
 ### Data
-Span-level annotations from the Havelock corpus with marker types normalized against a canonical taxonomy at build time. Spans are drawn from documents sourced from Project Gutenberg, textfiles.com, Reddit, and Wikipedia talk pages. A stratified 80/10/10 train/val/test split was used with swap-based optimization. The test set contains 1,609 spans (1,162 oral, 447 literate).
 ### Hyperparameters
@@ -84,7 +84,7 @@ Span-level annotations from the Havelock corpus with marker types normalized aga
 |-----------|-------|
 | Epochs | 20 |
 | Batch size | 16 |
-| Learning rate | 3e-5 |
 | Optimizer | AdamW (weight decay 0.01) |
 | LR schedule | Cosine with 10% warmup |
 | Gradient clipping | 1.0 |
@@ -92,19 +92,50 @@ Span-level annotations from the Havelock corpus with marker types normalized aga
 | Mixout | 0.1 |
 | Mixed precision | FP16 |
 ### Test Set Classification Report
 ```
               precision    recall  f1-score   support
-        oral      0.945     0.853     0.896      1162
-    literate      0.695     0.870     0.773       447
-    accuracy                          0.858      1609
-   macro avg      0.820     0.862     0.835      1609
-weighted avg      0.875     0.858     0.862      1609
 ```
-The model achieves high precision on oral spans (0.945) and high recall on literate spans (0.870). The precision gap on literate (0.695) indicates some oral spans are misclassified as literate — expected given the class imbalance (72% oral in test).
 ## Limitations
@@ -121,9 +152,9 @@ The oral–literate distinction follows Ong's framework. Oral markers include fe
 | Model | Task | Classes | F1 |
 |-------|------|---------|-----|
-| **This model** | Binary (oral/literate) | 2 | 0.835 |
-| [`HavelockAI/bert-marker-type`](https://huggingface.co/HavelockAI/bert-marker-type) | Functional type | 18 | 0.583 |
-| [`HavelockAI/bert-marker-subtype`](https://huggingface.co/HavelockAI/bert-marker-subtype) | Fine-grained subtype | 71 | 0.500 |
 | [`HavelockAI/bert-orality-regressor`](https://huggingface.co/HavelockAI/bert-orality-regressor) | Document-level score | Regression | MAE 0.079 |
 | [`HavelockAI/bert-token-classifier`](https://huggingface.co/HavelockAI/bert-token-classifier) | Span detection (BIO) | 145 | 0.500 |
@@ -140,7 +171,9 @@ The oral–literate distinction follows Ong's framework. Oral markers include fe
 ## References
 - Ong, Walter J. *Orality and Literacy: The Technologizing of the Word*. Routledge, 1982.
 ---
-*Model version: b31f147d · Trained: February 2026*

 license: mit
 tags:
 - text-classification
+- modernbert
 - orality
 - linguistics
 - rhetorical-analysis
 - f1
 - accuracy
 base_model:
+- answerdotai/ModernBERT-base
 pipeline_tag: text-classification
 library_name: transformers
 datasets:
       name: Oral/Literate Span Classification
     metrics:
     - type: f1
+      value: 0.804
       name: F1 (macro)
     - type: accuracy
+      value: 0.825
       name: Accuracy
 ---
 # Havelock Marker Category Classifier
+ModernBERT-based binary classifier that determines whether a rhetorical span is **oral** or **literate**, grounded in Walter Ong's *Orality and Literacy* (1982).
 This is the coarsest level of the Havelock span classification hierarchy. Given a text span that has been identified as a rhetorical marker, the model classifies it into one of two categories: oral (characteristic of spoken, performative discourse) or literate (characteristic of written, analytic discourse).
 | Property | Value |
 |----------|-------|
+| Base model | `answerdotai/ModernBERT-base` |
+| Architecture | `ModernBertForSequenceClassification` |
 | Task | Binary classification |
 | Labels | 2 (`oral`, `literate`) |
 | Max sequence length | 128 tokens |
+| Test F1 (macro) | **0.804** |
+| Test Accuracy | **0.825** |
 | Missing labels | 0/2 |
+| Parameters | ~149M |
 ## Usage
 ```python
 ### Data
+22,367 span-level annotations from the Havelock corpus with marker types normalized against a canonical taxonomy at build time. Spans are drawn from documents sourced from Project Gutenberg, textfiles.com, Reddit, and Wikipedia talk pages. A stratified 80/10/10 train/val/test split was used with swap-based optimization. The test set contains 1,609 spans (1,162 oral, 447 literate).
 ### Hyperparameters
 |-----------|-------|
 | Epochs | 20 |
 | Batch size | 16 |
+| Learning rate | 2e-5 |
 | Optimizer | AdamW (weight decay 0.01) |
 | LR schedule | Cosine with 10% warmup |
 | Gradient clipping | 1.0 |
 | Mixout | 0.1 |
 | Mixed precision | FP16 |
+### Training Metrics
+Best checkpoint selected at epoch 13 by missing-label-primary, F1-tiebreaker (0 missing, F1 0.850).
+<details><summary>Click to show per-epoch metrics</summary>
+| Epoch | Loss | Val F1 | F1 range |
+|-------|------|--------|----------|
+| 1 | 0.1231 | 0.815 | 0.786–0.843 |
+| 2 | 0.0785 | 0.829 | 0.795–0.863 |
+| 3 | 0.0599 | 0.835 | 0.804–0.866 |
+| 4 | 0.0457 | 0.816 | 0.788–0.844 |
+| 5 | 0.0356 | 0.826 | 0.794–0.857 |
+| 6 | 0.0290 | 0.834 | 0.787–0.881 |
+| 7 | 0.0235 | 0.836 | 0.802–0.869 |
+| 8 | 0.0188 | 0.837 | 0.799–0.876 |
+| 9 | 0.0175 | 0.840 | 0.805–0.875 |
+| 10 | 0.0162 | 0.839 | 0.802–0.875 |
+| 11 | 0.0115 | 0.834 | 0.796–0.872 |
+| 12 | 0.0103 | 0.836 | 0.801–0.870 |
+| **13** | **0.0097** | **0.850** | **0.812–0.887** |
+| 14 | 0.0086 | 0.827 | 0.794–0.861 |
+| 15 | 0.0075 | 0.835 | 0.799–0.871 |
+| 16 | 0.0074 | 0.828 | 0.794–0.862 |
+| 17 | 0.0071 | 0.830 | 0.796–0.863 |
+| 18 | 0.0073 | 0.840 | 0.804–0.877 |
+| 19 | 0.0068 | 0.843 | 0.806–0.880 |
+| 20 | 0.0070 | 0.844 | 0.808–0.880 |
+</details>
 ### Test Set Classification Report
 ```
               precision    recall  f1-score   support
+        oral      0.953     0.798     0.868      1162
+    literate      0.631     0.897     0.741       447
+    accuracy                          0.825      1609
+   macro avg      0.792     0.847     0.804      1609
+weighted avg      0.863     0.825     0.833      1609
 ```
+The model achieves high precision on oral spans (0.953) and high recall on literate spans (0.897). The precision gap on literate (0.631) indicates some oral spans are misclassified as literate — expected given the class imbalance (72% oral in test).
 ## Limitations
 | Model | Task | Classes | F1 |
 |-------|------|---------|-----|
+| **This model** | Binary (oral/literate) | 2 | 0.804 |
+| [`HavelockAI/bert-marker-type`](https://huggingface.co/HavelockAI/bert-marker-type) | Functional type | 18 | 0.573 |
+| [`HavelockAI/bert-marker-subtype`](https://huggingface.co/HavelockAI/bert-marker-subtype) | Fine-grained subtype | 71 | 0.493 |
 | [`HavelockAI/bert-orality-regressor`](https://huggingface.co/HavelockAI/bert-orality-regressor) | Document-level score | Regression | MAE 0.079 |
 | [`HavelockAI/bert-token-classifier`](https://huggingface.co/HavelockAI/bert-token-classifier) | Span detection (BIO) | 145 | 0.500 |
 ## References
 - Ong, Walter J. *Orality and Literacy: The Technologizing of the Word*. Routledge, 1982.
+- Lee, C. et al. "Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models." ICLR 2020.
+- Warner, A. et al. "Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference." 2024.
 ---
+*Trained: February 2026*

config.json CHANGED Viewed

@@ -1,30 +1,78 @@
 {
-  "add_cross_attention": false,
   "architectures": [
-    "BertForSequenceClassification"
   ],
-  "attention_probs_dropout_prob": 0.1,
-  "bos_token_id": null,
-  "classifier_dropout": null,
   "dtype": "float32",
-  "eos_token_id": null,
   "gradient_checkpointing": false,
-  "hidden_act": "gelu",
-  "hidden_dropout_prob": 0.1,
   "hidden_size": 768,
   "initializer_range": 0.02,
-  "intermediate_size": 3072,
-  "is_decoder": false,
-  "layer_norm_eps": 1e-12,
-  "max_position_embeddings": 512,
-  "model_type": "bert",
   "num_attention_heads": 12,
-  "num_hidden_layers": 12,
-  "pad_token_id": 0,
   "position_embedding_type": "absolute",
   "tie_word_embeddings": true,
   "transformers_version": "5.0.0",
-  "type_vocab_size": 2,
-  "use_cache": true,
-  "vocab_size": 30522
 }

 {
   "architectures": [
+    "ModernBertForSequenceClassification"
   ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 50281,
+  "classifier_activation": "gelu",
+  "classifier_bias": false,
+  "classifier_dropout": 0.0,
+  "classifier_pooling": "mean",
+  "cls_token_id": 50281,
+  "decoder_bias": true,
+  "deterministic_flash_attn": false,
   "dtype": "float32",
+  "embedding_dropout": 0.0,
+  "eos_token_id": 50282,
+  "global_attn_every_n_layers": 3,
   "gradient_checkpointing": false,
+  "hidden_activation": "gelu",
   "hidden_size": 768,
+  "initializer_cutoff_factor": 2.0,
   "initializer_range": 0.02,
+  "intermediate_size": 1152,
+  "layer_norm_eps": 1e-05,
+  "layer_types": [
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention"
+  ],
+  "local_attention": 128,
+  "max_position_embeddings": 8192,
+  "mlp_bias": false,
+  "mlp_dropout": 0.0,
+  "model_type": "modernbert",
+  "norm_bias": false,
+  "norm_eps": 1e-05,
   "num_attention_heads": 12,
+  "num_hidden_layers": 22,
+  "pad_token_id": 50283,
   "position_embedding_type": "absolute",
+  "repad_logits_with_grad": false,
+  "rope_parameters": {
+    "full_attention": {
+      "rope_theta": 160000.0,
+      "rope_type": "default"
+    },
+    "sliding_attention": {
+      "rope_theta": 10000.0,
+      "rope_type": "default"
+    }
+  },
+  "sep_token_id": 50282,
+  "sparse_pred_ignore_index": -100,
+  "sparse_prediction": false,
   "tie_word_embeddings": true,
   "transformers_version": "5.0.0",
+  "vocab_size": 50368
 }

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d59549d144944b5af2731a495cf0bc210fdfbda0825c20917d016f0f2921a121
-size 780065488

 version https://git-lfs.github.com/spec/v1
+oid sha256:0b38e8b6bcb2ea4652922602925a3a332772ab650ba7f1644b9b7e00e7d32d3d
+size 1039637504

tokenizer.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json CHANGED Viewed

@@ -1,14 +1,16 @@
 {
   "backend": "tokenizers",
   "cls_token": "[CLS]",
-  "do_lower_case": true,
   "is_local": false,
   "mask_token": "[MASK]",
-  "model_max_length": 512,
   "pad_token": "[PAD]",
   "sep_token": "[SEP]",
-  "strip_accents": null,
-  "tokenize_chinese_chars": true,
-  "tokenizer_class": "BertTokenizer",
   "unk_token": "[UNK]"
 }

 {
   "backend": "tokenizers",
+  "clean_up_tokenization_spaces": true,
   "cls_token": "[CLS]",
   "is_local": false,
   "mask_token": "[MASK]",
+  "model_input_names": [
+    "input_ids",
+    "attention_mask"
+  ],
+  "model_max_length": 8192,
   "pad_token": "[PAD]",
   "sep_token": "[SEP]",
+  "tokenizer_class": "TokenizersBackend",
   "unk_token": "[UNK]"
 }