File size: 2,886 Bytes
9479c0f ec01f74 53d4a63 9479c0f 4726839 622065b 9479c0f 91626d4 9479c0f 91626d4 9479c0f d909c85 91626d4 e8b714b 91626d4 9479c0f 91626d4 9479c0f 91626d4 9479c0f 91626d4 d909c85 91626d4 d909c85 91626d4 d909c85 91626d4 d909c85 91626d4 d909c85 9479c0f 91626d4 d909c85 9479c0f 91626d4 9479c0f d909c85 91626d4 d909c85 9479c0f 91626d4 9479c0f 91626d4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 |
---
tags:
- lora
- text-classification
- cefr
- en
base_model: microsoft/deberta-v3-large
license: cc-by-nc-sa-4.0
language:
- en
pipeline_tag: text-classification
datasets:
- dksysd/cefr-classification
---
# CEFR Classifier
A text classification model that predicts **CEFR (Common European Framework of Reference for Languages)** levels (A1-C2) for English texts.
Fine-tuned from `microsoft/deberta-v3-large`.
## Model Performance
**Parallel Corpus Dataset**

**Instruction Dataset**

## Quick Start
### Simple Usage (Recommended)
```python
from transformers import pipeline
# Load the classifier
classifier = pipeline("text-classification", model="dksysd/cefr-classifier")
# Classify a text
text = "This is a sample sentence to classify."
result = classifier(text)
print(result)
# [{'label': 'A1', 'score': 0.535}]
```
### Get All Class Probabilities
```python
classifier = pipeline(
"text-classification",
model="dksysd/cefr-classifier",
return_all_scores=True
)
result = classifier(text)[0]
for item in result:
print(f"{item['label']}: {item['score']:.4f}")
```
### Batch Processing
```python
texts = [
"The cat sat on the mat.",
"Quantum entanglement represents a fundamental phenomenon in physics.",
"I like pizza."
]
results = classifier(texts)
for text, result in zip(texts, results):
print(f"{text} -> {result['label']} ({result['score']:.3f})")
```
## Advanced Usage
### Manual Loading with PyTorch
For more control over the inference process:
```python
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
# Load model and tokenizer
model_name = "dksysd/cefr-classifier"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Setup device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()
# Label mapping
id2label = {0: 'A1', 1: 'A2', 2: 'B1', 3: 'B2', 4: 'C1', 5: 'C2'}
# Inference
text = "Your text here"
inputs = tokenizer(text, padding="max_length", truncation=True,
max_length=1024, return_tensors="pt").to(device)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)[0]
pred_idx = torch.argmax(probs).item()
print(f"Predicted: {id2label[pred_idx]} (confidence: {probs[pred_idx]:.4f})")
```
## CEFR Levels
- **A1**: Beginner
- **A2**: Elementary
- **B1**: Intermediate
- **B2**: Upper Intermediate
- **C1**: Advanced
- **C2**: Proficient
## License
This model is released under the CC-BY-NC-SA-4.0 license. |