RoBERTa-Contextual-Sarcasm-Hybrid
Model Description
This model is a fine-tuned version of cardiffnlp/twitter-roberta-base-irony optimized for detecting sarcasm in modern narrative dialogue. Unlike standard sentiment-based irony detectors, this model utilizes a Relational Attention mechanism enabled by a [PREVIOUS_CONTEXT] [SEP] [DIALOGUE] input schema.
Training Data & Methodology
The model was trained on a balanced hybrid corpus designed to minimize "classifier paranoia" in modern conversational agents:
- Contextual JSON (152 samples): Primary high-quality dialogue with situational context.
- Joshi Snippets (50 samples): Targeted sarcastic signals for Class 1 (Sarcastic) expansion.
- Gutenberg Anchoring (100 samples): Formal Victorian prose used for Class 0 (Sincere) stabilization.
Performance & Calibration
The model achieves high statistical recall but demonstrates specific behavioral biases:
- Modern Narrative: High calibration; successfully distinguishes between sincere frustration and ironic punchlines.
- Literary Irony: Exhibits a "Politeness Bias" where formal syntax is strongly correlated with sincerity (Class 0), leading to potential false negatives in classical irony.
| Metric | Score |
|---|---|
| Golden Set F1 | 0.8889 |
| Human Set F1 | 0.8000 |
| Threshold (Optimal) | 0.60 - 0.75 |
Intended Use
This model is intended for use in hybrid LLM systems and conversational agents where distinguishing between sincere user complaints and situational irony is critical for deterministic routing.
- Downloads last month
- 46
Dataset used to train sennatitcomb/sarcasm-detector-json-joshi-gutenberg-final
Evaluation results
- Golden Set F1self-reported0.889
- Human Set F1self-reported0.800