theluantran commited on
Commit
a6d6c28
·
verified ·
1 Parent(s): ec205d2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -23
README.md CHANGED
@@ -17,9 +17,18 @@ widget:
17
  example_title: "Complex sentence"
18
  ---
19
 
20
- # CEFR Text Classifier
21
 
22
- This model classifies English text by CEFR level (A1, A2, B1, B2, C1/C2).
 
 
 
 
 
 
 
 
 
23
 
24
  ## Labels
25
  - **A1**: Beginner
@@ -63,33 +72,14 @@ print(f"Predicted CEFR Level: {label_map[predicted_class]}")
63
  print(f"Confidence: {predictions[0][predicted_class].item():.2%}")
64
  ```
65
 
66
- ### Using Inference API
67
- ```python
68
- import requests
69
-
70
- API_URL = "https://router.huggingface.co/models/theluantran/cefr-bert-classifier"
71
- headers = {"Authorization": f"Bearer YOUR_HF_TOKEN"}
72
-
73
- def query(payload):
74
- response = requests.post(API_URL, headers=headers, json=payload)
75
- return response.json()
76
-
77
- output = query({"inputs": "This is a simple sentence."})
78
- print(output)
79
- ```
80
 
81
  ## Training Configuration
82
  - **Epochs**: 4
83
  - **Batch Size**: 16
84
  - **Learning Rate**: 2e-05
85
  - **Max Length**: 512
86
- - **Optimizer**: AdamW
87
  - **Weight Decay**: 0.01
88
 
89
- ## Limitations
90
- - The model shows high accuracy on in-domain data but lower generalization to out-of-domain texts
91
- - Best performance on formal written English
92
- - May struggle with informal language, slang, or domain-specific jargon
93
 
94
- ## Citation
95
- If you use this model, please cite appropriately.
 
17
  example_title: "Complex sentence"
18
  ---
19
 
20
+ # CEFR BERT Classifier
21
 
22
+ A fine-tuned RoBERTa-based transformer model for classifying English text by CEFR (Common European Framework of Reference for Languages) proficiency levels.
23
+
24
+ The source code to train this model can be found at: https://github.com/luantran/One-model-to-grade-them-all
25
+
26
+ ## Model Description
27
+
28
+ This model is part of an ensemble CEFR text classification system that combines multiple approaches to estimate language proficiency levels. The BERT/RoBERTa classifier leverages pre-trained transformer representations fine-tuned on CEFR-labeled data to capture deep contextual and linguistic patterns characteristic of different proficiency levels.
29
+ The other models part of this ensemble are:
30
+ - https://huggingface.co/theluantran/cefr-naive-bayes
31
+ - https://huggingface.co/theluantran/cefr-doc2vec
32
 
33
  ## Labels
34
  - **A1**: Beginner
 
72
  print(f"Confidence: {predictions[0][predicted_class].item():.2%}")
73
  ```
74
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75
 
76
  ## Training Configuration
77
  - **Epochs**: 4
78
  - **Batch Size**: 16
79
  - **Learning Rate**: 2e-05
80
  - **Max Length**: 512
 
81
  - **Weight Decay**: 0.01
82
 
83
+ ## License
 
 
 
84
 
85
+ This model is released for research and educational purposes. The training data is proprietary and not included.