Hindi Edu Classifier

hindi-roberta-edu-classifier is a HindRoBERTa based model that can be used for judging the educational value of a given Hindi text string. This model was trained on the Polygl0t/hindi-edu-qwen-annotations dataset.

Details

  • Dataset: hindi-edu-qwen-annotations
  • Language: Hindi
  • Number of Training Epochs: 20
  • Batch size: 256
  • Optimizer: torch.optim.AdamW
  • Learning Rate: 3e-4
  • Eval Metric: f1-score

This repository has the source code used to train this model.

Evaluation Results

Confusion Matrix

1 2 3 4 5
1 8607 1661 72 1 0
2 1834 4349 580 18 0
3 120 885 1207 102 0
4 7 52 300 202 0
5 0 0 1 2 0
  • Precision: 0.52416
  • Recall: 0.47107
  • F1 Macro: 0.49048
  • Accuracy: 0.71825

Usage

Here's an example of how to use the Edu Classifier:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

tokenizer = AutoTokenizer.from_pretrained("Polygl0t/hindi-roberta-edu-classifier")
model = AutoModelForSequenceClassification.from_pretrained("Polygl0t/hindi-roberta-edu-classifier")
model.to(device)


text = "यह एक उदाहरण है।"
encoded_input  =  tokenizer(text, return_tensors="pt", padding="longest", truncation=True).to(device)

with  torch.no_grad():
    model_output  =  model(**encoded_input)
    logits  =  model_output.logits.squeeze(-1).float().cpu().numpy()

# scores are produced in the range [0, 4]. To convert to the range [1, 5], we can simply add 1 to the score.
score = [x + 1 for x in logits.tolist()][0]

print({
 "text": text,
 "score": score,
 "int_score": [int(round(max(0, min(score, 4)))) + 1 for score in logits][0],
})

Cite as 🤗

@misc{shiza2026lilmoo,
      title={{Raising Bars, Not Parameters: LilMoo Compact Language Model for Hindi}}, 
      author={Shiza Fatimah and Aniket Sen and Sophia Falk and Florian Mai and Lucie Flek and Nicholas Kluge Corr{\^e}a},
      year={2026},
      eprint={2603.03508},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2603.03508}, 
}

Aknowlegments

Polyglot is a project funded by the Federal Ministry of Education and Research (BMBF) and the Ministry of Culture and Science of the State of North Rhine-Westphalia (MWK) as part of TRA Sustainable Futures (University of Bonn) and the Excellence Strategy of the federal and state governments.

We also gratefully acknowledge the granted access to the Marvin cluster hosted by University of Bonn along with the support provided by its High Performance Computing & Analytics Lab.

License

According to l3cube-pune/hindi-roberta, the model is released under cc-by-4.0. For any queries, please get in touch with the authors of the original paper tied to hindi-roberta.

Downloads last month
9
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Polygl0t/hindi-roberta-edu-classifier

Finetuned
(2)
this model

Dataset used to train Polygl0t/hindi-roberta-edu-classifier

Collection including Polygl0t/hindi-roberta-edu-classifier

Paper for Polygl0t/hindi-roberta-edu-classifier