File size: 1,134 Bytes
0a90908 c6b5b59 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
---
tags:
- model_hub_mixin
- pytorch_model_hub_mixin
---
# FineWeb2-RoEdu-Classifier
**FineWeb2-RoEdu-Classifier** is a lightweight quality classifier for the Romanian language. It is designed to distinguish high-quality educational content from generic web text. The model was trained on data annotated by [Gemma3 12B](https://huggingface.co/google/gemma-3-12b-it). More details can be found [here](https://arxiv.org/abs/2511.01090).
## Key Features
* **Educational Quality Scoring**: The model assigns a scalar score (typically 0-5) to text, reflecting its educational value and coherence.
* **Topic, Format and Educational Level**: The model also predicts additional signals that could be used for diversity filtering.
* **Distilled Knowledge**: It is trained on Romanian web samples annotated by **Gemma3 12B**, effectively distilling the frontier model's judgment into a more efficient architecture.
* **Proven Effectiveness**: We showed that used data curated by this classifier improved several metrics (ARC, HellaSwag).
## Usage
You can find a demo [here](https://github.com/VladNegoita/FineWeb2-RoEdu-ClassifierDemo/). |