CEFR English Level Classifier

A fine-tuned Qwen2.5-1.5B model that classifies English text into CEFR proficiency levels (A1 → C2).

Level	Description	Recall
A1	Beginner	96.6%
A2	Elementary	90.0%
B1	Intermediate	90.0%
B2	Upper-Intermediate	86.7%
C1	Advanced	86.7%
C2	Mastery	60.0%

Overall accuracy: 84.9% · F1 macro: 84.9%

Note: C2/C1 confusion is expected — the boundary between mastery and advanced is inherently subtle, even for human annotators.

Quick start

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
import torch.nn.functional as F

model_id = "yanou16/cefr-english-classifier"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model     = AutoModelForSequenceClassification.from_pretrained(model_id)
model.eval()

LABELS = ["A1", "A2", "B1", "B2", "C1", "C2"]

def predict(text: str) -> dict:
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
    with torch.no_grad():
        logits = model(**inputs).logits
    probs    = F.softmax(logits, dim=-1)[0].tolist()
    pred_idx = logits.argmax().item()
    return {
        "level":         LABELS[pred_idx],
        "confidence":    round(probs[pred_idx], 4),
        "probabilities": {LABELS[i]: round(p, 4) for i, p in enumerate(probs)},
    }

# Examples
print(predict("I have dog. I like it very much."))
# → {"level": "A1", "confidence": 0.97, ...}

print(predict("Despite the challenging circumstances, she managed to articulate her concerns with remarkable clarity."))
# → {"level": "C1", "confidence": 0.89, ...}

Training details

Parameter	Value
Base model	Qwen/Qwen2.5-1.5B
Method	QLoRA (4-bit NF4)
LoRA rank	32
LoRA alpha	64
Target modules	q_proj, v_proj
Epochs	5
Learning rate	2e-4
Scheduler	Cosine
Batch size	8
Max length	256 tokens
Training samples	1,605
Test samples	179

Training curves

Training loss converged smoothly from 1.4 → 0.37 over 5 epochs. Best checkpoint at epoch 4 (step 804).

Dataset

Trained on yanou16/cefr-dataset — a synthetic dataset of 1,785 English texts generated via Groq API (Llama-3.3-70b) with detailed per-level linguistic profiles.

1,785 samples across 6 CEFR levels (balanced, ~298 per level)
10 domains: email, essay, chat, product review, social media, travel diary, academic, job application, news commentary, forum reply
Train / Test split: 90% / 10%, stratified

Intended use

✅ Educational platforms (adaptive content difficulty)
✅ Language learning apps (placement tests)
✅ Writing assistants (level feedback)
✅ NLP research on language proficiency

Limitations

Trained on synthetic data — may not fully capture authentic learner errors
C2 recall is lower (60%) due to similarity with C1 texts
Best suited for texts of 20–150 words
English only

Author

Rayane Louzazna — AI Engineering student at CESI Algérie
HuggingFace · GitHub

Downloads last month: -

Safetensors

Model size

2B params

Tensor type

F16

Model tree for yanou16/cefr-english-classifier

Base model

Qwen/Qwen2.5-1.5B

Adapter

(503)

this model

Dataset used to train yanou16/cefr-english-classifier

Space using yanou16/cefr-english-classifier 1

Evaluation results

accuracy on yanou16/cefr-dataset
self-reported

0.849
f1 on yanou16/cefr-dataset
self-reported

0.849