CEFR Naive Bayes Classifier

A Multinomial Naive Bayes model for classifying English text by CEFR (Common European Framework of Reference for Languages) proficiency levels.

Model Description

This model is part of an ensemble CEFR text classification system that combines multiple approaches to estimate language proficiency levels. The Naive Bayes classifier provides fast, interpretable predictions based on word frequency patterns characteristic of different proficiency levels.

Labels

The model classifies text into 5 CEFR proficiency levels:

  • A1: Beginner
  • A2: Elementary
  • B1: Intermediate
  • B2: Upper Intermediate
  • C1/C2: Advanced

Model Details

  • Type: Multinomial Naive Bayes
  • Framework: scikit-learn
  • Task: Multi-class text classification
  • Input: Raw text strings
  • Output: Class predictions (0-4) with probability distributions
  • Files:
    • model.pkl: Trained Naive Bayes classifier
    • vectorizer.pkl: TF-IDF/Count vectorizer for text preprocessing

Usage

Basic Prediction

from huggingface_hub import hf_hub_download
import joblib

# Download model files
model_path = hf_hub_download(
    repo_id="theluantran/cefr-naive-bayes",
    filename="model.pkl"
)
vectorizer_path = hf_hub_download(
    repo_id="theluantran/cefr-naive-bayes",
    filename="vectorizer.pkl"
)

# Load model and vectorizer
model = joblib.load(model_path)
vectorizer = joblib.load(vectorizer_path)

# Predict
text = "This is a sample text to classify"
features = vectorizer.transform([text])
prediction = model.predict(features)[0]
probabilities = model.predict_proba(features)[0]

# Map numeric prediction to CEFR level
level_map = {0: 'A1', 1: 'A2', 2: 'B1', 3: 'B2', 4: 'C1/C2'}
predicted_level = level_map[prediction]

print(f"Predicted level: {predicted_level}")
print(f"Confidence: {max(probabilities):.2%}")

License

This model is released for research and educational purposes. The training data is proprietary and not included.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including theluantran/cefr-naive-bayes