dksysd
/

cefr-classifier

@@ -12,68 +12,111 @@ pipeline_tag: text-classification
 datasets:
 - dksysd/cefr-classification
 ---
-# cefr-classifier
-This is a `text-classification` model that classifies a given text according to the **Common European Framework of Reference for Languages (CEFR)** levels, from A1 to C2.
-This model was fine-tuned from the `microsoft/deberta-v3-large` base model.
 ## Model Performance
-For Parallel Corpus Dataset
-![confusion_matrix_parallel](https:&#x2F;&#x2F;cdn-uploads.huggingface.co&#x2F;production&#x2F;uploads&#x2F;67c124daa19ae7b9efa277a1&#x2F;yWEuGel3zHSH4wf_a5uZt.png)
-For Instruction Dataset
-![confusion_matrix_instruction](https:&#x2F;&#x2F;cdn-uploads.huggingface.co&#x2F;production&#x2F;uploads&#x2F;67c124daa19ae7b9efa277a1&#x2F;RRQdVcwyuo3Y9NZO9aBXN.png)
-## How to Use
-You can use this model directly with the `transformers` library:
 ```python
 import torch
 from transformers import AutoTokenizer, AutoModelForSequenceClassification
-# 1. Load model and tokenizer
 model_name = "dksysd/cefr-classifier"
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 model = AutoModelForSequenceClassification.from_pretrained(model_name)
-# 2. Set device
 device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 model.to(device)
 model.eval()
-# (Optional) Label mapping is stored in the model's config
-# id2label = model.config.id2label
 id2label = {0: 'A1', 1: 'A2', 2: 'B1', 3: 'B2', 4: 'C1', 5: 'C2'}
-# 3. Text to classify
-text = ""
-# 4. Tokenize and run inference
-inputs = tokenizer(
-    text,
-    padding="max_length",
-    truncation=True,
-    max_length=1024,
-    return_tensors="pt"
-).to(device)
 with torch.no_grad():
     outputs = model(**inputs)
-    logits = outputs.logits
-    probs = torch.softmax(logits, dim=-1)[0]
     pred_idx = torch.argmax(probs).item()
-    confidence = probs[pred_idx].item()
-predicted_level = id2label[pred_idx]
-all_probs = {id2label[i]: probs[i].item() for i in range(len(id2label))}
-print(f"Predicted Level: {predicted_level}")
-print(f"Confidence: {confidence:.4f}")
-print("All Probabilities:")
-print(all_probs)
-```

 datasets:
 - dksysd/cefr-classification
 ---
+# CEFR Classifier
+A text classification model that predicts **CEFR (Common European Framework of Reference for Languages)** levels (A1-C2) for English texts.
+Fine-tuned from `microsoft/deberta-v3-large`.
 ## Model Performance
+**Parallel Corpus Dataset**
+![confusion_matrix_parallel](https://cdn-uploads.huggingface.co/production/uploads/67c124daa19ae7b9efa277a1/yWEuGel3zHSH4wf_a5uZt.png)
+**Instruction Dataset**
+![confusion_matrix_instruction](https://cdn-uploads.huggingface.co/production/uploads/67c124daa19ae7b9efa277a1/RRQdVcwyuo3Y9NZO9aBXN.png)
+## Quick Start
+### Simple Usage (Recommended)
+```python
+from transformers import pipeline
+# Load the classifier
+classifier = pipeline("text-classification", model="dksysd/cefr-classifier")
+# Classify a text
+text = "This is a sample sentence to classify."
+result = classifier(text)
+print(result)
+# [{'label': 'B2', 'score': 0.9234}]
+```
+### Get All Class Probabilities
+```python
+classifier = pipeline(
+    "text-classification",
+    model="dksysd/cefr-classifier",
+    return_all_scores=True
+)
+result = classifier(text)[0]
+for item in result:
+    print(f"{item['label']}: {item['score']:.4f}")
+```
+### Batch Processing
+```python
+texts = [
+    "The cat sat on the mat.",
+    "Quantum entanglement represents a fundamental phenomenon in physics.",
+    "I like pizza."
+]
+results = classifier(texts)
+for text, result in zip(texts, results):
+    print(f"{text} -> {result['label']} ({result['score']:.3f})")
+```
+## Advanced Usage
+### Manual Loading with PyTorch
+For more control over the inference process:
 ```python
 import torch
 from transformers import AutoTokenizer, AutoModelForSequenceClassification
+# Load model and tokenizer
 model_name = "dksysd/cefr-classifier"
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 model = AutoModelForSequenceClassification.from_pretrained(model_name)
+# Setup device
 device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 model.to(device)
 model.eval()
+# Label mapping
 id2label = {0: 'A1', 1: 'A2', 2: 'B1', 3: 'B2', 4: 'C1', 5: 'C2'}
+# Inference
+text = "Your text here"
+inputs = tokenizer(text, padding="max_length", truncation=True,
+                   max_length=1024, return_tensors="pt").to(device)
 with torch.no_grad():
     outputs = model(**inputs)
+    probs = torch.softmax(outputs.logits, dim=-1)[0]
     pred_idx = torch.argmax(probs).item()
+print(f"Predicted: {id2label[pred_idx]} (confidence: {probs[pred_idx]:.4f})")
+```
+## CEFR Levels
+- **A1**: Beginner
+- **A2**: Elementary
+- **B1**: Intermediate
+- **B2**: Upper Intermediate
+- **C1**: Advanced
+- **C2**: Proficient
+## License
+This model is released under the CC-BY-NC-SA-4.0 license.