metadata
tags:
- model_hub_mixin
- pytorch_model_hub_mixin
FineWeb2-RoEdu-Classifier
FineWeb2-RoEdu-Classifier is a lightweight quality classifier for the Romanian language. It is designed to distinguish high-quality educational content from generic web text. The model was trained on data annotated by Gemma3 12B. More details can be found here.
Key Features
- Educational Quality Scoring: The model assigns a scalar score (typically 0-5) to text, reflecting its educational value and coherence.
- Topic, Format and Educational Level: The model also predicts additional signals that could be used for diversity filtering.
- Distilled Knowledge: It is trained on Romanian web samples annotated by Gemma3 12B, effectively distilling the frontier model's judgment into a more efficient architecture.
- Proven Effectiveness: We showed that used data curated by this classifier improved several metrics (ARC, HellaSwag).
Usage
You can find a demo here.