Euphemism Detector (V1 โ€” English)

An updated multilingual version is available: hasancanbiyik/euphemism-detector-multilingual โ€” fine-tuned on 7 languages (EN/TR/ZH/ES/YO/PL/UK) with 0.808 macro-F1 and zero-shot transfer to 22 additional languages.

Fine-tuned XLM-RoBERTa-base for euphemism disambiguation on English PETs (Potentially Euphemistic Terms). Given a sentence with a marked phrase, the model predicts whether the phrase is used euphemistically or literally.

This model was fine-tuned with the English PETs dataset created by the NLP Lab at Montclair State University, U.S.A.

Performance (English)

Class Precision Recall F1
Literal 0.81 0.83 0.82
Euphemistic 0.88 0.86 0.87
Macro avg 0.84 0.84 0.84

Usage

The model expects input text with [PET_BOUNDARY] tokens marking the target phrase:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import torch.nn.functional as F

tokenizer = AutoTokenizer.from_pretrained("hasancanbiyik/euphemism-detector")
model = AutoModelForSequenceClassification.from_pretrained("hasancanbiyik/euphemism-detector")
model.eval()

text = "My grandmother [PET_BOUNDARY]passed away[PET_BOUNDARY] last Tuesday."
inputs = tokenizer(text, return_tensors="pt", max_length=256, truncation=True)

with torch.no_grad():
    probs = F.softmax(model(**inputs).logits, dim=1).squeeze()

print(f"Euphemistic: {probs[1].item():.1%}")
print(f"Literal:     {probs[0].item():.1%}")

Updated Version

For multilingual support (7 training languages + zero-shot transfer to 22 additional languages), batch prediction, and improved performance, see the V2 model:

hasancanbiyik/euphemism-detector-multilingual

Research Context

  • Biyik, H. C., Lee, P., & Feldman, A. (2024). Turkish Delights: A Dataset on Turkish Euphemisms. SIGTURK at ACL 2024. arXiv:2407.13040
  • Biyik, H. C., Barak, L., Peng, J., & Feldman, A. (2026). When Semantic Overlap Is Not Enough: Cross-Lingual Euphemism Transfer Between Turkish and English. SIGTURK at EACL 2026. arXiv:2602.16957

License

MIT

Downloads last month
37
Safetensors
Model size
0.3B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Papers for hasancanbiyik/euphemism-detector