VladNA's picture
Update README.md
c6b5b59 verified
metadata
tags:
  - model_hub_mixin
  - pytorch_model_hub_mixin

FineWeb2-RoEdu-Classifier

FineWeb2-RoEdu-Classifier is a lightweight quality classifier for the Romanian language. It is designed to distinguish high-quality educational content from generic web text. The model was trained on data annotated by Gemma3 12B. More details can be found here.

Key Features

  • Educational Quality Scoring: The model assigns a scalar score (typically 0-5) to text, reflecting its educational value and coherence.
  • Topic, Format and Educational Level: The model also predicts additional signals that could be used for diversity filtering.
  • Distilled Knowledge: It is trained on Romanian web samples annotated by Gemma3 12B, effectively distilling the frontier model's judgment into a more efficient architecture.
  • Proven Effectiveness: We showed that used data curated by this classifier improved several metrics (ARC, HellaSwag).

Usage

You can find a demo here.