CEFR English Level Classifier

A fine-tuned Qwen2.5-1.5B model that classifies English text into CEFR proficiency levels (A1 → C2).

Level Description Recall
A1 Beginner 96.6%
A2 Elementary 90.0%
B1 Intermediate 90.0%
B2 Upper-Intermediate 86.7%
C1 Advanced 86.7%
C2 Mastery 60.0%

Overall accuracy: 84.9% · F1 macro: 84.9%

Note: C2/C1 confusion is expected — the boundary between mastery and advanced is inherently subtle, even for human annotators.


Quick start

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
import torch.nn.functional as F

model_id = "yanou16/cefr-english-classifier"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model     = AutoModelForSequenceClassification.from_pretrained(model_id)
model.eval()

LABELS = ["A1", "A2", "B1", "B2", "C1", "C2"]

def predict(text: str) -> dict:
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
    with torch.no_grad():
        logits = model(**inputs).logits
    probs    = F.softmax(logits, dim=-1)[0].tolist()
    pred_idx = logits.argmax().item()
    return {
        "level":         LABELS[pred_idx],
        "confidence":    round(probs[pred_idx], 4),
        "probabilities": {LABELS[i]: round(p, 4) for i, p in enumerate(probs)},
    }

# Examples
print(predict("I have dog. I like it very much."))
# → {"level": "A1", "confidence": 0.97, ...}

print(predict("Despite the challenging circumstances, she managed to articulate her concerns with remarkable clarity."))
# → {"level": "C1", "confidence": 0.89, ...}

Training details

Parameter Value
Base model Qwen/Qwen2.5-1.5B
Method QLoRA (4-bit NF4)
LoRA rank 32
LoRA alpha 64
Target modules q_proj, v_proj
Epochs 5
Learning rate 2e-4
Scheduler Cosine
Batch size 8
Max length 256 tokens
Training samples 1,605
Test samples 179

Training curves

Training loss converged smoothly from 1.4 → 0.37 over 5 epochs. Best checkpoint at epoch 4 (step 804).


Dataset

Trained on yanou16/cefr-dataset — a synthetic dataset of 1,785 English texts generated via Groq API (Llama-3.3-70b) with detailed per-level linguistic profiles.

  • 1,785 samples across 6 CEFR levels (balanced, ~298 per level)
  • 10 domains: email, essay, chat, product review, social media, travel diary, academic, job application, news commentary, forum reply
  • Train / Test split: 90% / 10%, stratified

Intended use

  • ✅ Educational platforms (adaptive content difficulty)
  • ✅ Language learning apps (placement tests)
  • ✅ Writing assistants (level feedback)
  • ✅ NLP research on language proficiency

Limitations

  • Trained on synthetic data — may not fully capture authentic learner errors
  • C2 recall is lower (60%) due to similarity with C1 texts
  • Best suited for texts of 20–150 words
  • English only

Author

Rayane Louzazna — AI Engineering student at CESI Algérie
HuggingFace · GitHub

Downloads last month
-
Safetensors
Model size
2B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for yanou16/cefr-english-classifier

Adapter
(503)
this model

Dataset used to train yanou16/cefr-english-classifier

Space using yanou16/cefr-english-classifier 1

Evaluation results