CEFR English Level Classifier
A fine-tuned Qwen2.5-1.5B model that classifies English text into CEFR proficiency levels (A1 → C2).
| Level | Description | Recall |
|---|---|---|
| A1 | Beginner | 96.6% |
| A2 | Elementary | 90.0% |
| B1 | Intermediate | 90.0% |
| B2 | Upper-Intermediate | 86.7% |
| C1 | Advanced | 86.7% |
| C2 | Mastery | 60.0% |
Overall accuracy: 84.9% · F1 macro: 84.9%
Note: C2/C1 confusion is expected — the boundary between mastery and advanced is inherently subtle, even for human annotators.
Quick start
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
import torch.nn.functional as F
model_id = "yanou16/cefr-english-classifier"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
model.eval()
LABELS = ["A1", "A2", "B1", "B2", "C1", "C2"]
def predict(text: str) -> dict:
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
logits = model(**inputs).logits
probs = F.softmax(logits, dim=-1)[0].tolist()
pred_idx = logits.argmax().item()
return {
"level": LABELS[pred_idx],
"confidence": round(probs[pred_idx], 4),
"probabilities": {LABELS[i]: round(p, 4) for i, p in enumerate(probs)},
}
# Examples
print(predict("I have dog. I like it very much."))
# → {"level": "A1", "confidence": 0.97, ...}
print(predict("Despite the challenging circumstances, she managed to articulate her concerns with remarkable clarity."))
# → {"level": "C1", "confidence": 0.89, ...}
Training details
| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen2.5-1.5B |
| Method | QLoRA (4-bit NF4) |
| LoRA rank | 32 |
| LoRA alpha | 64 |
| Target modules | q_proj, v_proj |
| Epochs | 5 |
| Learning rate | 2e-4 |
| Scheduler | Cosine |
| Batch size | 8 |
| Max length | 256 tokens |
| Training samples | 1,605 |
| Test samples | 179 |
Training curves
Training loss converged smoothly from 1.4 → 0.37 over 5 epochs. Best checkpoint at epoch 4 (step 804).
Dataset
Trained on yanou16/cefr-dataset — a synthetic dataset of 1,785 English texts generated via Groq API (Llama-3.3-70b) with detailed per-level linguistic profiles.
- 1,785 samples across 6 CEFR levels (balanced, ~298 per level)
- 10 domains: email, essay, chat, product review, social media, travel diary, academic, job application, news commentary, forum reply
- Train / Test split: 90% / 10%, stratified
Intended use
- ✅ Educational platforms (adaptive content difficulty)
- ✅ Language learning apps (placement tests)
- ✅ Writing assistants (level feedback)
- ✅ NLP research on language proficiency
Limitations
- Trained on synthetic data — may not fully capture authentic learner errors
- C2 recall is lower (60%) due to similarity with C1 texts
- Best suited for texts of 20–150 words
- English only
Author
Rayane Louzazna — AI Engineering student at CESI Algérie
HuggingFace · GitHub
- Downloads last month
- -
Model tree for yanou16/cefr-english-classifier
Base model
Qwen/Qwen2.5-1.5BDataset used to train yanou16/cefr-english-classifier
Space using yanou16/cefr-english-classifier 1
Evaluation results
- accuracy on yanou16/cefr-datasetself-reported0.849
- f1 on yanou16/cefr-datasetself-reported0.849