VladNA commited on
Commit
c6b5b59
·
verified ·
1 Parent(s): 4b4d8da

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -3
README.md CHANGED
@@ -4,6 +4,17 @@ tags:
4
  - pytorch_model_hub_mixin
5
  ---
6
 
7
- This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
8
- - Library: [More Information Needed]
9
- - Docs: [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
4
  - pytorch_model_hub_mixin
5
  ---
6
 
7
+ # FineWeb2-RoEdu-Classifier
8
+
9
+ **FineWeb2-RoEdu-Classifier** is a lightweight quality classifier for the Romanian language. It is designed to distinguish high-quality educational content from generic web text. The model was trained on data annotated by [Gemma3 12B](https://huggingface.co/google/gemma-3-12b-it). More details can be found [here](https://arxiv.org/abs/2511.01090).
10
+
11
+ ## Key Features
12
+
13
+ * **Educational Quality Scoring**: The model assigns a scalar score (typically 0-5) to text, reflecting its educational value and coherence.
14
+ * **Topic, Format and Educational Level**: The model also predicts additional signals that could be used for diversity filtering.
15
+ * **Distilled Knowledge**: It is trained on Romanian web samples annotated by **Gemma3 12B**, effectively distilling the frontier model's judgment into a more efficient architecture.
16
+ * **Proven Effectiveness**: We showed that used data curated by this classifier improved several metrics (ARC, HellaSwag).
17
+
18
+ ## Usage
19
+
20
+ You can find a demo [here](https://github.com/VladNegoita/FineWeb2-RoEdu-ClassifierDemo/).