French / Roman Urdu Language Detector

A lightweight language classification model using TF-IDF + Logistic Regression (scikit-learn).

Usage

import joblib

clf = joblib.load("logistic_regression_model.joblib")
vectorizer = joblib.load("tfidf_vectorizer.joblib")

text = "Bonjour comment allez vous"
prediction = clf.predict(vectorizer.transform([text]))[0]
print(prediction)  # → French

Performance

  • Accuracy: 100.00% on held-out test set (20% split)
  • Features: 1913 character n-gram TF-IDF features (2–4 grams, char_wb)
  • Training data: 50 French + 50 Roman Urdu dummy phrases

Files

File Description
logistic_regression_model.joblib Trained LogisticRegression classifier
tfidf_vectorizer.joblib Fitted TfidfVectorizer (char n-grams)
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support