Instructions to use star092304/cefr-level-deberta-v3-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use star092304/cefr-level-deberta-v3-base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="star092304/cefr-level-deberta-v3-base")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("star092304/cefr-level-deberta-v3-base") model = AutoModelForSequenceClassification.from_pretrained("star092304/cefr-level-deberta-v3-base") - Notebooks
- Google Colab
- Kaggle
DeBERTa-v3 CEFR Vocabulary Classifier
A fine-tuned DeBERTa-v3-base model for predicting the CEFR proficiency level of English vocabulary items.
The model classifies words into six Common European Framework of Reference (CEFR) levels:
- A1
- A2
- B1
- B2
- C1
- C2
This model is intended for:
- Vocabulary difficulty estimation
- Language learning applications
- CEFR-aware educational tools
- Vocabulary profiling
- Adaptive learning systems
- Linguistic research
Model Details
| Item | Value |
|---|---|
| Base Model | microsoft/deberta-v3-base |
| Task | CEFR Classification |
| Labels | A1, A2, B1, B2, C1, C2 |
| Architecture | DeBERTa-v3 |
| Framework | Hugging Face Transformers |
| Language | English |
Dataset
This model was trained using CEFR annotations derived from:
Dataset:
star092304/CEFR-Annotated-WordNet
Dataset card:
https://huggingface.co/datasets/star092304/CEFR-Annotated-WordNet
The dataset provides CEFR proficiency annotations for WordNet lexical entries and was created based on the following work:
Reference Paper
CEFR-Annotated WordNet: LLM-Based Proficiency-Guided Semantic Database for Language Learning
Authors:
- Masato Kikuchi
- Masatsugu Ono
- Toshioki Soga
- Tetsu Tanabe
- Tadachika Ozono
Paper:
https://arxiv.org/html/2510.18466v2
Citation
@article{kikuchi2025cefrannotatedwordnet,
title={CEFR-Annotated WordNet: LLM-Based Proficiency-Guided Semantic Database for Language Learning},
author={Kikuchi, Masato and Ono, Masatsugu and Soga, Toshioki and Tanabe, Tetsu and Ozono, Tadachika},
year={2025}
}
Training Data Construction
Training examples were generated by aligning:
- SemCor sense annotations
- WordNet lexical entries
- CEFR labels from CEFR-Annotated WordNet
Additional preprocessing included:
- Lemmatization
- Sense matching
- Multi-word expression removal
- Proper noun filtering
Only single-word lexical items were retained for training.
Training Performance
Training Curves
Test Confusion Matrix
Evaluation Preview
Usage
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification
import torch
model_name = "star092304/cefr-level-deberta-v3-base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
label_names = ["A1", "A2", "B1", "B2", "C1", "C2"]
word = "investigation"
inputs = tokenizer(
word,
return_tensors="pt",
truncation=True
)
with torch.no_grad():
outputs = model(**inputs)
pred_id = outputs.logits.argmax(dim=-1).item()
print(label_names[pred_id])
Example Predictions
| Word | Predicted CEFR |
|---|---|
| book | A1 |
| happy | A1 |
| journey | A2 |
| improve | B1 |
| investigation | B2 |
| sophisticated | C1 |
| quintessential | C2 |
Limitations
- The model predicts vocabulary difficulty at the word level.
- CEFR levels can vary depending on context and meaning.
- Polysemous words may belong to multiple CEFR levels depending on usage.
- Predictions should be interpreted as estimated proficiency levels rather than absolute ground truth.
Intended Use
This model is intended for:
- Educational research
- Language learning systems
- Vocabulary recommendation engines
- CEFR-aware NLP pipelines
The model is not intended for high-stakes educational assessment or certification decisions.
Acknowledgements
This work builds upon:
- WordNet
- SemCor
- CEFR-Annotated WordNet
- Hugging Face Transformers
- Microsoft DeBERTa-v3
Special thanks to the authors of the CEFR-Annotated WordNet dataset and paper.
- Downloads last month
- 88
Model tree for star092304/cefr-level-deberta-v3-base
Base model
microsoft/deberta-v3-base

