theluantran
/

cefr-bert-classifier

Text Classification

english-learner

text-embeddings-inference

Model card Files Files and versions

theluantran commited on Dec 26, 2025

Commit

a6d6c28

·

verified ·

1 Parent(s): ec205d2

Update README.md

Files changed (1) hide show

README.md +13 -23

README.md CHANGED Viewed

@@ -17,9 +17,18 @@ widget:
   example_title: "Complex sentence"
 ---
-# CEFR Text Classifier
-This model classifies English text by CEFR level (A1, A2, B1, B2, C1/C2).
 ## Labels
 - **A1**: Beginner
@@ -63,33 +72,14 @@ print(f"Predicted CEFR Level: {label_map[predicted_class]}")
 print(f"Confidence: {predictions[0][predicted_class].item():.2%}")
 ```
-### Using Inference API
-```python
-import requests
-API_URL = "https://router.huggingface.co/models/theluantran/cefr-bert-classifier"
-headers = {"Authorization": f"Bearer YOUR_HF_TOKEN"}
-def query(payload):
-    response = requests.post(API_URL, headers=headers, json=payload)
-    return response.json()
-output = query({"inputs": "This is a simple sentence."})
-print(output)
-```
 ## Training Configuration
 - **Epochs**: 4
 - **Batch Size**: 16
 - **Learning Rate**: 2e-05
 - **Max Length**: 512
-- **Optimizer**: AdamW
 - **Weight Decay**: 0.01
-## Limitations
-- The model shows high accuracy on in-domain data but lower generalization to out-of-domain texts
-- Best performance on formal written English
-- May struggle with informal language, slang, or domain-specific jargon
-## Citation
-If you use this model, please cite appropriately.

   example_title: "Complex sentence"
 ---
+# CEFR BERT Classifier
+A fine-tuned RoBERTa-based transformer model for classifying English text by CEFR (Common European Framework of Reference for Languages) proficiency levels.
+The source code to train this model can be found at: https://github.com/luantran/One-model-to-grade-them-all
+## Model Description
+This model is part of an ensemble CEFR text classification system that combines multiple approaches to estimate language proficiency levels. The BERT/RoBERTa classifier leverages pre-trained transformer representations fine-tuned on CEFR-labeled data to capture deep contextual and linguistic patterns characteristic of different proficiency levels.
+The other models part of this ensemble are:
+- https://huggingface.co/theluantran/cefr-naive-bayes
+- https://huggingface.co/theluantran/cefr-doc2vec
 ## Labels
 - **A1**: Beginner
 print(f"Confidence: {predictions[0][predicted_class].item():.2%}")
 ```
 ## Training Configuration
 - **Epochs**: 4
 - **Batch Size**: 16
 - **Learning Rate**: 2e-05
 - **Max Length**: 512
 - **Weight Decay**: 0.01
+## License
+This model is released for research and educational purposes. The training data is proprietary and not included.