--- license: mit tags: - text-classification - modernbert - orality - linguistics - rhetorical-analysis language: - en metrics: - f1 - accuracy base_model: - answerdotai/ModernBERT-base pipeline_tag: text-classification library_name: transformers datasets: - custom model-index: - name: bert-marker-subtype results: - task: type: text-classification name: Marker Subtype Classification metrics: - type: f1 value: 0.493 name: F1 (macro) - type: accuracy value: 0.500 name: Accuracy --- # Havelock Marker Subtype Classifier ModernBERT-based classifier for **71 fine-grained rhetorical marker subtypes** on the oral–literate spectrum, grounded in Walter Ong's *Orality and Literacy* (1982). This is the finest level of the Havelock span classification hierarchy. Given a text span identified as a rhetorical marker, the model classifies it into one of 71 specific rhetorical devices (e.g., `anaphora`, `epistemic_hedge`, `vocative`, `nested_clauses`). ## Model Details | Property | Value | |----------|-------| | Base model | `answerdotai/ModernBERT-base` | | Architecture | `ModernBertForSequenceClassification` | | Task | Multi-class classification (71 classes) | | Max sequence length | 128 tokens | | Test F1 (macro) | **0.493** | | Test Accuracy | **0.500** | | Missing labels (test) | 1/71 (`proverb`) | | Parameters | ~149M | ## Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch model_name = "HavelockAI/bert-marker-subtype" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) span = "it seems likely that this would, in principle, be feasible" inputs = tokenizer(span, return_tensors="pt", truncation=True, max_length=128) with torch.no_grad(): logits = model(**inputs).logits pred = torch.argmax(logits, dim=1).item() print(f"Marker subtype: {model.config.id2label[pred]}") ``` ## Label Taxonomy (71 subtypes) ### Oral Subtypes (36) | Category | Subtypes | |----------|----------| | **Repetition & Pattern** | `anaphora`, `epistrophe`, `parallelism`, `tricolon`, `lexical_repetition`, `refrain` | | **Sound & Rhythm** | `alliteration`, `assonance`, `rhyme`, `rhythm` | | **Address & Interaction** | `vocative`, `imperative`, `second_person`, `inclusive_we`, `rhetorical_question`, `audience_response`, `phatic_check`, `phatic_filler` | | **Conjunction** | `polysyndeton`, `asyndeton`, `simple_conjunction` | | **Formulas** | `discourse_formula`, `proverb`, `religious_formula`, `epithet` | | **Narrative** | `named_individual`, `specific_place`, `temporal_anchor`, `sensory_detail`, `embodied_action`, `everyday_example` | | **Performance** | `dramatic_pause`, `self_correction`, `conflict_frame`, `us_them`, `intensifier_doubling`, `antithesis` | ### Literate Subtypes (35) | Category | Subtypes | |----------|----------| | **Abstraction** | `nominalization`, `abstract_noun`, `conceptual_metaphor`, `categorical_statement` | | **Syntax** | `nested_clauses`, `relative_chain`, `conditional`, `concessive`, `temporal_embedding`, `causal_chain` | | **Hedging** | `epistemic_hedge`, `probability`, `evidential`, `qualified_assertion`, `concessive_connector` | | **Impersonality** | `agentless_passive`, `agent_demoted`, `institutional_subject`, `objectifying_stance`, `third_person_reference` | | **Scholarly Apparatus** | `citation`, `footnote_reference`, `cross_reference`, `metadiscourse`, `methodological_framing` | | **Technical** | `technical_term`, `technical_abbreviation`, `enumeration`, `list_structure`, `definitional_move` | | **Connectives** | `contrastive`, `causal_explicit`, `additive_formal`, `aside` | ## Training ### Data 22,367 span-level annotations from the Havelock corpus with marker types normalized against a canonical taxonomy at build time. Each span carries a `marker_subtype` field. Only subtypes with ≥10 examples are included. A stratified 80/10/10 train/val/test split was used with swap-based optimization to balance label distributions across splits. The test set contains 2,357 spans. ### Hyperparameters | Parameter | Value | |-----------|-------| | Epochs | 20 | | Batch size | 16 | | Learning rate | 3e-5 | | Optimizer | AdamW (weight decay 0.01) | | LR schedule | Cosine with 10% warmup | | Gradient clipping | 1.0 | | Loss | Focal loss (γ=2.0) + class weights | | Mixout | 0.1 | | Mixed precision | FP16 | | Min examples per class | 10 | ### Training Metrics Best checkpoint selected at epoch 15 by missing-label-primary, F1-tiebreaker (0 missing, F1 0.486). ### Test Set Classification Report
Click to expand per-class precision/recall/F1/support ``` precision recall f1-score support abstract_noun 0.408 0.330 0.365 88 additive_formal 0.286 0.167 0.211 12 agent_demoted 0.667 1.000 0.800 10 agentless_passive 0.583 0.491 0.533 57 alliteration 0.500 0.200 0.286 10 anaphora 0.500 0.537 0.518 41 antithesis 0.947 0.818 0.878 22 aside 0.615 0.216 0.320 37 assonance 1.000 0.960 0.980 25 asyndeton 0.636 0.500 0.560 14 audience_response 1.000 0.800 0.889 10 categorical_statement 0.103 0.200 0.136 20 causal_chain 0.442 0.452 0.447 42 causal_explicit 0.400 0.468 0.431 47 citation 0.743 0.565 0.642 46 conceptual_metaphor 0.065 0.051 0.057 39 concessive 0.595 0.556 0.575 45 concessive_connector 0.882 0.833 0.857 18 conditional 0.596 0.609 0.602 87 conflict_frame 0.733 0.733 0.733 15 contrastive 0.533 0.525 0.529 61 cross_reference 0.733 0.458 0.564 24 definitional_move 0.286 0.200 0.235 10 discourse_formula 0.405 0.508 0.451 118 dramatic_pause 0.833 0.500 0.625 10 embodied_action 0.375 0.214 0.273 42 enumeration 0.510 0.605 0.553 43 epistemic_hedge 0.102 0.357 0.159 14 epistrophe 0.824 0.875 0.848 16 epithet 0.333 0.250 0.286 12 everyday_example 0.312 0.179 0.227 28 evidential 0.667 0.432 0.525 37 footnote_reference 0.417 0.500 0.455 10 imperative 0.645 0.600 0.622 100 inclusive_we 0.630 0.576 0.602 59 institutional_subject 0.938 0.714 0.811 21 intensifier_doubling 0.944 0.773 0.850 22 lexical_repetition 0.417 0.556 0.476 45 list_structure 0.267 0.174 0.211 23 metadiscourse 0.085 0.182 0.116 22 methodological_framing 0.500 0.190 0.276 21 named_individual 0.500 0.300 0.375 30 nested_clauses 0.500 0.348 0.410 46 nominalization 0.288 0.304 0.296 56 objectifying_stance 0.267 0.400 0.320 10 parallelism 0.350 0.259 0.298 27 phatic_check 0.500 0.364 0.421 11 phatic_filler 0.333 0.800 0.471 10 polysyndeton 1.000 0.792 0.884 24 probability 0.500 0.455 0.476 22 proverb 0.000 0.000 0.000 10 qualified_assertion 0.250 0.241 0.246 29 refrain 0.944 0.708 0.810 24 relative_chain 0.350 0.509 0.415 55 religious_formula 0.857 0.750 0.800 16 rhetorical_question 0.688 0.762 0.723 84 rhyme 0.231 0.300 0.261 10 rhythm 0.909 0.625 0.741 16 second_person 0.571 0.586 0.579 116 self_correction 0.821 0.575 0.676 40 sensory_detail 0.364 0.200 0.258 20 simple_conjunction 0.167 0.300 0.214 10 specific_place 0.400 0.222 0.286 18 technical_abbreviation 0.900 0.321 0.474 28 technical_term 0.426 0.703 0.531 74 temporal_anchor 0.396 0.618 0.483 34 temporal_embedding 0.500 0.562 0.529 48 third_person_reference 0.700 0.700 0.700 10 tricolon 0.611 0.611 0.611 18 us_them 0.733 0.611 0.667 18 vocative 0.462 0.600 0.522 20 accuracy 0.500 2357 macro avg 0.535 0.484 0.493 2357 weighted avg 0.532 0.500 0.503 2357 ```
**Top performing subtypes (F1 ≥ 0.75):** `assonance` (0.980), `polysyndeton` (0.884), `antithesis` (0.878), `concessive_connector` (0.857), `intensifier_doubling` (0.850), `epistrophe` (0.848), `audience_response` (0.889), `institutional_subject` (0.811), `refrain` (0.810), `agent_demoted` (0.800), `religious_formula` (0.800), `conflict_frame` (0.733), `rhythm` (0.741), `rhetorical_question` (0.723). **Weakest subtypes (F1 < 0.20):** `proverb` (0.000), `conceptual_metaphor` (0.057), `metadiscourse` (0.116), `categorical_statement` (0.136), `epistemic_hedge` (0.159). These tend to be semantically diffuse classes that overlap heavily with neighbouring subtypes or have very low test support. ## Class Distribution The training set exhibits significant imbalance across 71 classes: | Support Range | Example Classes | Count | |---------------|-----------------|-------| | >1000 | `discourse_formula`, `second_person` | 2 | | 500–1000 | `conditional`, `rhetorical_question`, `technical_term`, `imperative` | 8 | | 200–500 | `abstract_noun`, `contrastive`, `inclusive_we`, `nominalization` | 27 | | 100–200 | `alliteration`, `antithesis`, `asyndeton`, `epistrophe`, `refrain` | 30 | | <100 | `footnote_reference`, `phatic_check`, `technical_abbreviation` | 4 | ## Limitations - **71-way classification on ~22k spans**: The data budget per class is thin, particularly for classes near the minimum. More data or class consolidation would help. - **Semantic overlap**: Some subtypes are difficult to distinguish from surface text alone (e.g., `parallelism` vs `anaphora` vs `tricolon`; `epistemic_hedge` vs `qualified_assertion` vs `probability`). The model may benefit from hierarchical classification that conditions on type-level predictions. - **Recall-precision tradeoff on rare classes**: Many rare classes show high precision but lower recall (e.g., `self_correction`: P=0.821, R=0.575; `technical_abbreviation`: P=0.900, R=0.321), suggesting the model learns narrow prototypes but misses variation. - **Span-level only**: Requires pre-extracted spans. Does not detect boundaries. - **128-token context window**: Longer spans are truncated. ## Theoretical Background The 71 subtypes represent the full granularity of the Havelock taxonomy, operationalizing Ong's oral–literate framework into specific, annotatable rhetorical devices. Oral subtypes capture the textural signatures of spoken and performative discourse: repetitive structures (`anaphora`, `epistrophe`, `tricolon`), sound patterning (`alliteration`, `assonance`, `rhythm`), direct audience engagement (`vocative`, `imperative`, `rhetorical_question`), and formulas (`proverb`, `epithet`, `discourse_formula`). Literate subtypes capture the apparatus of analytic prose: complex syntax (`nested_clauses`, `relative_chain`, `conditional`), epistemic positioning (`epistemic_hedge`, `evidential`, `probability`), impersonal voice (`agentless_passive`, `institutional_subject`), and scholarly machinery (`citation`, `footnote_reference`, `metadiscourse`). ## Related Models | Model | Task | Classes | F1 | |-------|------|---------|-----| | [`HavelockAI/bert-marker-category`](https://huggingface.co/HavelockAI/bert-marker-category) | Binary (oral/literate) | 2 | 0.875 | | [`HavelockAI/bert-marker-type`](https://huggingface.co/HavelockAI/bert-marker-type) | Functional type | 18 | 0.583 | | **This model** | Fine-grained subtype | 71 | 0.493 | | [`HavelockAI/bert-orality-regressor`](https://huggingface.co/HavelockAI/bert-orality-regressor) | Document-level score | Regression | MAE 0.079 | | [`HavelockAI/bert-token-classifier`](https://huggingface.co/HavelockAI/bert-token-classifier) | Span detection (BIO) | 145 | 0.500 | ## Citation ```bibtex @misc{havelock2026subtype, title={Havelock Marker Subtype Classifier}, author={Havelock AI}, year={2026}, url={https://huggingface.co/HavelockAI/bert-marker-subtype} } ``` ## References - Ong, Walter J. *Orality and Literacy: The Technologizing of the Word*. Routledge, 1982. - Lee, C. et al. "Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models." ICLR 2020. - Warner, A. et al. "Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference." 2024. --- *Trained: February 2026*