--- tags: - model_hub_mixin - pytorch_model_hub_mixin --- # FineWeb2-RoEdu-Classifier **FineWeb2-RoEdu-Classifier** is a lightweight quality classifier for the Romanian language. It is designed to distinguish high-quality educational content from generic web text. The model was trained on data annotated by [Gemma3 12B](https://huggingface.co/google/gemma-3-12b-it). More details can be found [here](https://arxiv.org/abs/2511.01090). ## Key Features * **Educational Quality Scoring**: The model assigns a scalar score (typically 0-5) to text, reflecting its educational value and coherence. * **Topic, Format and Educational Level**: The model also predicts additional signals that could be used for diversity filtering. * **Distilled Knowledge**: It is trained on Romanian web samples annotated by **Gemma3 12B**, effectively distilling the frontier model's judgment into a more efficient architecture. * **Proven Effectiveness**: We showed that used data curated by this classifier improved several metrics (ARC, HellaSwag). ## Usage You can find a demo [here](https://github.com/VladNegoita/FineWeb2-RoEdu-ClassifierDemo/).