File size: 1,134 Bytes
0a90908
 
 
 
 
 
c6b5b59
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
---
tags:
- model_hub_mixin
- pytorch_model_hub_mixin
---

# FineWeb2-RoEdu-Classifier

**FineWeb2-RoEdu-Classifier** is a lightweight quality classifier for the Romanian language. It is designed to distinguish high-quality educational content from generic web text. The model was trained on data annotated by [Gemma3 12B](https://huggingface.co/google/gemma-3-12b-it). More details can be found [here](https://arxiv.org/abs/2511.01090).

## Key Features

* **Educational Quality Scoring**: The model assigns a scalar score (typically 0-5) to text, reflecting its educational value and coherence.
* **Topic, Format and Educational Level**: The model also predicts additional signals that could be used for diversity filtering.
* **Distilled Knowledge**: It is trained on Romanian web samples annotated by **Gemma3 12B**, effectively distilling the frontier model's judgment into a more efficient architecture.
* **Proven Effectiveness**: We showed that used data curated by this classifier improved several metrics (ARC, HellaSwag).

## Usage

You can find a demo [here](https://github.com/VladNegoita/FineWeb2-RoEdu-ClassifierDemo/).