File size: 13,892 Bytes

---
license: mit
tags:
- text-classification
- modernbert
- orality
- linguistics
- rhetorical-analysis
language:
- en
metrics:
- f1
- accuracy
base_model:
- answerdotai/ModernBERT-base
pipeline_tag: text-classification
library_name: transformers
datasets:
- custom
model-index:
- name: bert-marker-subtype
  results:
  - task:
      type: text-classification
      name: Marker Subtype Classification
    metrics:
    - type: f1
      value: 0.493
      name: F1 (macro)
    - type: accuracy
      value: 0.500
      name: Accuracy
---

# Havelock Marker Subtype Classifier

ModernBERT-based classifier for **71 fine-grained rhetorical marker subtypes** on the oral–literate spectrum, grounded in Walter Ong's *Orality and Literacy* (1982).

This is the finest level of the Havelock span classification hierarchy. Given a text span identified as a rhetorical marker, the model classifies it into one of 71 specific rhetorical devices (e.g., `anaphora`, `epistemic_hedge`, `vocative`, `nested_clauses`).

## Model Details

| Property | Value |
|----------|-------|
| Base model | `answerdotai/ModernBERT-base` |
| Architecture | `ModernBertForSequenceClassification` |
| Task | Multi-class classification (71 classes) |
| Max sequence length | 128 tokens |
| Test F1 (macro) | **0.493** |
| Test Accuracy | **0.500** |
| Missing labels (test) | 1/71 (`proverb`) |
| Parameters | ~149M |

## Usage
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "HavelockAI/bert-marker-subtype"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

span = "it seems likely that this would, in principle, be feasible"
inputs = tokenizer(span, return_tensors="pt", truncation=True, max_length=128)

with torch.no_grad():
    logits = model(**inputs).logits
    pred = torch.argmax(logits, dim=1).item()

print(f"Marker subtype: {model.config.id2label[pred]}")
```

## Label Taxonomy (71 subtypes)

### Oral Subtypes (36)

| Category | Subtypes |
|----------|----------|
| **Repetition & Pattern** | `anaphora`, `epistrophe`, `parallelism`, `tricolon`, `lexical_repetition`, `refrain` |
| **Sound & Rhythm** | `alliteration`, `assonance`, `rhyme`, `rhythm` |
| **Address & Interaction** | `vocative`, `imperative`, `second_person`, `inclusive_we`, `rhetorical_question`, `audience_response`, `phatic_check`, `phatic_filler` |
| **Conjunction** | `polysyndeton`, `asyndeton`, `simple_conjunction` |
| **Formulas** | `discourse_formula`, `proverb`, `religious_formula`, `epithet` |
| **Narrative** | `named_individual`, `specific_place`, `temporal_anchor`, `sensory_detail`, `embodied_action`, `everyday_example` |
| **Performance** | `dramatic_pause`, `self_correction`, `conflict_frame`, `us_them`, `intensifier_doubling`, `antithesis` |

### Literate Subtypes (35)

| Category | Subtypes |
|----------|----------|
| **Abstraction** | `nominalization`, `abstract_noun`, `conceptual_metaphor`, `categorical_statement` |
| **Syntax** | `nested_clauses`, `relative_chain`, `conditional`, `concessive`, `temporal_embedding`, `causal_chain` |
| **Hedging** | `epistemic_hedge`, `probability`, `evidential`, `qualified_assertion`, `concessive_connector` |
| **Impersonality** | `agentless_passive`, `agent_demoted`, `institutional_subject`, `objectifying_stance`, `third_person_reference` |
| **Scholarly Apparatus** | `citation`, `footnote_reference`, `cross_reference`, `metadiscourse`, `methodological_framing` |
| **Technical** | `technical_term`, `technical_abbreviation`, `enumeration`, `list_structure`, `definitional_move` |
| **Connectives** | `contrastive`, `causal_explicit`, `additive_formal`, `aside` |

## Training

### Data

22,367 span-level annotations from the Havelock corpus with marker types normalized against a canonical taxonomy at build time. Each span carries a `marker_subtype` field. Only subtypes with ≥10 examples are included. A stratified 80/10/10 train/val/test split was used with swap-based optimization to balance label distributions across splits. The test set contains 2,357 spans.

### Hyperparameters

| Parameter | Value |
|-----------|-------|
| Epochs | 20 |
| Batch size | 16 |
| Learning rate | 3e-5 |
| Optimizer | AdamW (weight decay 0.01) |
| LR schedule | Cosine with 10% warmup |
| Gradient clipping | 1.0 |
| Loss | Focal loss (γ=2.0) + class weights |
| Mixout | 0.1 |
| Mixed precision | FP16 |
| Min examples per class | 10 |

### Training Metrics

Best checkpoint selected at epoch 15 by missing-label-primary, F1-tiebreaker (0 missing, F1 0.486).

### Test Set Classification Report

<details><summary>Click to expand per-class precision/recall/F1/support</summary>
```
                        precision    recall  f1-score   support

         abstract_noun      0.408     0.330     0.365        88
       additive_formal      0.286     0.167     0.211        12
         agent_demoted      0.667     1.000     0.800        10
     agentless_passive      0.583     0.491     0.533        57
          alliteration      0.500     0.200     0.286        10
              anaphora      0.500     0.537     0.518        41
            antithesis      0.947     0.818     0.878        22
                 aside      0.615     0.216     0.320        37
             assonance      1.000     0.960     0.980        25
             asyndeton      0.636     0.500     0.560        14
     audience_response      1.000     0.800     0.889        10
 categorical_statement      0.103     0.200     0.136        20
          causal_chain      0.442     0.452     0.447        42
       causal_explicit      0.400     0.468     0.431        47
              citation      0.743     0.565     0.642        46
   conceptual_metaphor      0.065     0.051     0.057        39
            concessive      0.595     0.556     0.575        45
  concessive_connector      0.882     0.833     0.857        18
           conditional      0.596     0.609     0.602        87
        conflict_frame      0.733     0.733     0.733        15
           contrastive      0.533     0.525     0.529        61
       cross_reference      0.733     0.458     0.564        24
     definitional_move      0.286     0.200     0.235        10
     discourse_formula      0.405     0.508     0.451       118
        dramatic_pause      0.833     0.500     0.625        10
       embodied_action      0.375     0.214     0.273        42
           enumeration      0.510     0.605     0.553        43
       epistemic_hedge      0.102     0.357     0.159        14
            epistrophe      0.824     0.875     0.848        16
               epithet      0.333     0.250     0.286        12
      everyday_example      0.312     0.179     0.227        28
            evidential      0.667     0.432     0.525        37
    footnote_reference      0.417     0.500     0.455        10
            imperative      0.645     0.600     0.622       100
          inclusive_we      0.630     0.576     0.602        59
 institutional_subject      0.938     0.714     0.811        21
  intensifier_doubling      0.944     0.773     0.850        22
    lexical_repetition      0.417     0.556     0.476        45
        list_structure      0.267     0.174     0.211        23
         metadiscourse      0.085     0.182     0.116        22
methodological_framing      0.500     0.190     0.276        21
      named_individual      0.500     0.300     0.375        30
        nested_clauses      0.500     0.348     0.410        46
        nominalization      0.288     0.304     0.296        56
   objectifying_stance      0.267     0.400     0.320        10
           parallelism      0.350     0.259     0.298        27
          phatic_check      0.500     0.364     0.421        11
         phatic_filler      0.333     0.800     0.471        10
          polysyndeton      1.000     0.792     0.884        24
           probability      0.500     0.455     0.476        22
               proverb      0.000     0.000     0.000        10
   qualified_assertion      0.250     0.241     0.246        29
               refrain      0.944     0.708     0.810        24
        relative_chain      0.350     0.509     0.415        55
     religious_formula      0.857     0.750     0.800        16
   rhetorical_question      0.688     0.762     0.723        84
                 rhyme      0.231     0.300     0.261        10
                rhythm      0.909     0.625     0.741        16
         second_person      0.571     0.586     0.579       116
       self_correction      0.821     0.575     0.676        40
        sensory_detail      0.364     0.200     0.258        20
    simple_conjunction      0.167     0.300     0.214        10
        specific_place      0.400     0.222     0.286        18
technical_abbreviation      0.900     0.321     0.474        28
        technical_term      0.426     0.703     0.531        74
       temporal_anchor      0.396     0.618     0.483        34
    temporal_embedding      0.500     0.562     0.529        48
third_person_reference      0.700     0.700     0.700        10
              tricolon      0.611     0.611     0.611        18
               us_them      0.733     0.611     0.667        18
              vocative      0.462     0.600     0.522        20

              accuracy                          0.500      2357
             macro avg      0.535     0.484     0.493      2357
          weighted avg      0.532     0.500     0.503      2357
```

</details>

**Top performing subtypes (F1 ≥ 0.75):** `assonance` (0.980), `polysyndeton` (0.884), `antithesis` (0.878), `concessive_connector` (0.857), `intensifier_doubling` (0.850), `epistrophe` (0.848), `audience_response` (0.889), `institutional_subject` (0.811), `refrain` (0.810), `agent_demoted` (0.800), `religious_formula` (0.800), `conflict_frame` (0.733), `rhythm` (0.741), `rhetorical_question` (0.723).

**Weakest subtypes (F1 < 0.20):** `proverb` (0.000), `conceptual_metaphor` (0.057), `metadiscourse` (0.116), `categorical_statement` (0.136), `epistemic_hedge` (0.159). These tend to be semantically diffuse classes that overlap heavily with neighbouring subtypes or have very low test support.

## Class Distribution

The training set exhibits significant imbalance across 71 classes:

| Support Range | Example Classes | Count |
|---------------|-----------------|-------|
| >1000 | `discourse_formula`, `second_person` | 2 |
| 500–1000 | `conditional`, `rhetorical_question`, `technical_term`, `imperative` | 8 |
| 200–500 | `abstract_noun`, `contrastive`, `inclusive_we`, `nominalization` | 27 |
| 100–200 | `alliteration`, `antithesis`, `asyndeton`, `epistrophe`, `refrain` | 30 |
| <100 | `footnote_reference`, `phatic_check`, `technical_abbreviation` | 4 |

## Limitations

- **71-way classification on ~22k spans**: The data budget per class is thin, particularly for classes near the minimum. More data or class consolidation would help.
- **Semantic overlap**: Some subtypes are difficult to distinguish from surface text alone (e.g., `parallelism` vs `anaphora` vs `tricolon`; `epistemic_hedge` vs `qualified_assertion` vs `probability`). The model may benefit from hierarchical classification that conditions on type-level predictions.
- **Recall-precision tradeoff on rare classes**: Many rare classes show high precision but lower recall (e.g., `self_correction`: P=0.821, R=0.575; `technical_abbreviation`: P=0.900, R=0.321), suggesting the model learns narrow prototypes but misses variation.
- **Span-level only**: Requires pre-extracted spans. Does not detect boundaries.
- **128-token context window**: Longer spans are truncated.

## Theoretical Background

The 71 subtypes represent the full granularity of the Havelock taxonomy, operationalizing Ong's oral–literate framework into specific, annotatable rhetorical devices. Oral subtypes capture the textural signatures of spoken and performative discourse: repetitive structures (`anaphora`, `epistrophe`, `tricolon`), sound patterning (`alliteration`, `assonance`, `rhythm`), direct audience engagement (`vocative`, `imperative`, `rhetorical_question`), and formulas (`proverb`, `epithet`, `discourse_formula`). Literate subtypes capture the apparatus of analytic prose: complex syntax (`nested_clauses`, `relative_chain`, `conditional`), epistemic positioning (`epistemic_hedge`, `evidential`, `probability`), impersonal voice (`agentless_passive`, `institutional_subject`), and scholarly machinery (`citation`, `footnote_reference`, `metadiscourse`).

## Related Models

| Model | Task | Classes | F1 |
|-------|------|---------|-----|
| [`HavelockAI/bert-marker-category`](https://huggingface.co/HavelockAI/bert-marker-category) | Binary (oral/literate) | 2 | 0.875 |
| [`HavelockAI/bert-marker-type`](https://huggingface.co/HavelockAI/bert-marker-type) | Functional type | 18 | 0.583 |
| **This model** | Fine-grained subtype | 71 | 0.493 |
| [`HavelockAI/bert-orality-regressor`](https://huggingface.co/HavelockAI/bert-orality-regressor) | Document-level score | Regression | MAE 0.079 |
| [`HavelockAI/bert-token-classifier`](https://huggingface.co/HavelockAI/bert-token-classifier) | Span detection (BIO) | 145 | 0.500 |

## Citation
```bibtex
@misc{havelock2026subtype,
  title={Havelock Marker Subtype Classifier},
  author={Havelock AI},
  year={2026},
  url={https://huggingface.co/HavelockAI/bert-marker-subtype}
}
```

## References

- Ong, Walter J. *Orality and Literacy: The Technologizing of the Word*. Routledge, 1982.
- Lee, C. et al. "Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models." ICLR 2020.
- Warner, A. et al. "Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference." 2024.

---

*Trained: February 2026*