Binary MWE Detection with DeBERTa
DeBERTa-v3-large fine-tuned for multiword expression (MWE) identification. Detects both continuous MWEs (kick the bucket) and discontinuous MWEs (look [the information] up).
๐ Paper: "Binary Token-Level Classification with DeBERTa for All-Type MWE Identification" (EACL 2026 Findings)
๐ป Code: github.com/DiegoRossini/binary-mwe-detection
Approach
Instead of traditional BIO sequence labeling, we predict three independent binary labels per token: START, END, and INSIDE. This captures whether each token begins, ends, or is inside an MWE.
Example: "looked the information up" where {looked, up} is a discontinuous MWE:
| Token | START | END | INSIDE |
|---|---|---|---|
| looked | โ | ||
| the | |||
| information | |||
| up | โ |
This formulation naturally handles discontinuous patterns and provides richer training signals than span-level labeling.
Results
Evaluated on CoAM. We outperform the previous state-of-the-art (Qwen-72B) by +12 F1 points with 165ร fewer parameters.
| Model | Parameters | F1 | Continuous F1 | Discontinuous F1 |
|---|---|---|---|---|
| Ours | 435M | 69.8% | 75.9% | 29.7% |
| Qwen-72B (prev. SOTA) | 72B | 57.8% | 57.3% | 17.1% |
Installation
pip install transformers torch spacy networkx
python -m spacy download en_core_web_lg
Usage
from transformers import AutoModel
model = AutoModel.from_pretrained("DiegoRossini/mwe-detection-deberta", trust_remote_code=True)
# Continuous MWE
mwes = model.detect("They made up their minds.")
print(mwes) # ['made up']
# Discontinuous MWE
mwes = model.detect("I ran into an old friend yesterday.")
print(mwes) # ['ran into']
# Detailed output with scores
mwes = model.detect("He kicked the bucket.", return_details=True)
Training
- Base model: DeBERTa-v3-large
- Dataset: CoAM (780 train / 521 test)
- Features: NP chunking + dependency distances (via spaCy)
- Augmentation: 30% oversampling
- Thresholds: ฯ_start=0.5, ฯ_end=0.6, ฯ_inside=0.2
Citation
@inproceedings{rossini2026binary,
title = "Binary Token-Level Classification with {DeBERTa} for All-Type {MWE} Identification",
author = "Rossini, Diego and van der Plas, Lonneke",
booktitle = "Findings of EACL 2026",
year = "2026"
}
- Downloads last month
- 13