language: en
license: mit
tags:
- multiword-expressions
- mwe
- token-classification
- deberta
- nlp
datasets:
- yusuke196/CoAM
metrics:
- f1
pipeline_tag: token-classification
Binary MWE Detection with DeBERTa
DeBERTa-v3-large fine-tuned for multiword expression (MWE) identification. Detects both continuous MWEs (kick the bucket) and discontinuous MWEs (look [the information] up).
📄 Paper: "Binary Token-Level Classification with DeBERTa for All-Type MWE Identification" (EACL 2026 Findings)
💻 Code: github.com/DiegoRossini/binary-mwe-detection
Approach
Instead of traditional BIO sequence labeling, we predict three independent binary labels per token: START, END, and INSIDE. This captures whether each token begins, ends, or is inside an MWE.
Example: "looked the information up" where {looked, up} is a discontinuous MWE:
| Token | START | END | INSIDE |
|---|---|---|---|
| looked | ✓ | ||
| the | |||
| information | |||
| up | ✓ |
This formulation naturally handles discontinuous patterns and provides richer training signals than span-level labeling.
Results
Evaluated on CoAM. We outperform the previous state-of-the-art (Qwen-72B) by +12 F1 points with 165× fewer parameters.
| Model | Parameters | F1 | Continuous F1 | Discontinuous F1 |
|---|---|---|---|---|
| Ours | 435M | 69.8% | 75.9% | 29.7% |
| Qwen-72B (prev. SOTA) | 72B | 57.8% | 57.3% | 17.1% |
Installation
pip install transformers torch spacy networkx
python -m spacy download en_core_web_lg
Usage
from transformers import AutoModel
model = AutoModel.from_pretrained("DiegoRossini/mwe-detection-deberta", trust_remote_code=True)
# Continuous MWE
mwes = model.detect("They made up their minds.")
print(mwes) # ['made up']
# Discontinuous MWE
mwes = model.detect("I ran into an old friend yesterday.")
print(mwes) # ['ran into']
# Detailed output with scores
mwes = model.detect("He kicked the bucket.", return_details=True)
Training
- Base model: DeBERTa-v3-large
- Dataset: CoAM (780 train / 521 test)
- Features: NP chunking + dependency distances (via spaCy)
- Augmentation: 30% oversampling
- Thresholds: τ_start=0.5, τ_end=0.6, τ_inside=0.2
Citation
@inproceedings{rossini2026binary,
title = "Binary Token-Level Classification with {DeBERTa} for All-Type {MWE} Identification",
author = "Rossini, Diego and van der Plas, Lonneke",
booktitle = "Findings of EACL 2026",
year = "2026"
}