--- tags: - lora - text-classification - cefr - en base_model: microsoft/deberta-v3-large license: cc-by-nc-sa-4.0 language: - en pipeline_tag: text-classification datasets: - dksysd/cefr-classification --- # CEFR Classifier A text classification model that predicts **CEFR (Common European Framework of Reference for Languages)** levels (A1-C2) for English texts. Fine-tuned from `microsoft/deberta-v3-large`. ## Model Performance **Parallel Corpus Dataset** ![confusion_matrix_parallel](https://cdn-uploads.huggingface.co/production/uploads/67c124daa19ae7b9efa277a1/yWEuGel3zHSH4wf_a5uZt.png) **Instruction Dataset** ![confusion_matrix_instruction](https://cdn-uploads.huggingface.co/production/uploads/67c124daa19ae7b9efa277a1/RRQdVcwyuo3Y9NZO9aBXN.png) ## Quick Start ### Simple Usage (Recommended) ```python from transformers import pipeline # Load the classifier classifier = pipeline("text-classification", model="dksysd/cefr-classifier") # Classify a text text = "This is a sample sentence to classify." result = classifier(text) print(result) # [{'label': 'A1', 'score': 0.535}] ``` ### Get All Class Probabilities ```python classifier = pipeline( "text-classification", model="dksysd/cefr-classifier", return_all_scores=True ) result = classifier(text)[0] for item in result: print(f"{item['label']}: {item['score']:.4f}") ``` ### Batch Processing ```python texts = [ "The cat sat on the mat.", "Quantum entanglement represents a fundamental phenomenon in physics.", "I like pizza." ] results = classifier(texts) for text, result in zip(texts, results): print(f"{text} -> {result['label']} ({result['score']:.3f})") ``` ## Advanced Usage ### Manual Loading with PyTorch For more control over the inference process: ```python import torch from transformers import AutoTokenizer, AutoModelForSequenceClassification # Load model and tokenizer model_name = "dksysd/cefr-classifier" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) # Setup device device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model.to(device) model.eval() # Label mapping id2label = {0: 'A1', 1: 'A2', 2: 'B1', 3: 'B2', 4: 'C1', 5: 'C2'} # Inference text = "Your text here" inputs = tokenizer(text, padding="max_length", truncation=True, max_length=1024, return_tensors="pt").to(device) with torch.no_grad(): outputs = model(**inputs) probs = torch.softmax(outputs.logits, dim=-1)[0] pred_idx = torch.argmax(probs).item() print(f"Predicted: {id2label[pred_idx]} (confidence: {probs[pred_idx]:.4f})") ``` ## CEFR Levels - **A1**: Beginner - **A2**: Elementary - **B1**: Intermediate - **B2**: Upper Intermediate - **C1**: Advanced - **C2**: Proficient ## License This model is released under the CC-BY-NC-SA-4.0 license.