--- language: tr license: apache-2.0 tags: - text-classification - educational-content - turkish - fineweb-edu - encoder - regression datasets: - YsK-dev/TurkWeb-Edu-AnnotationsV3 base_model: boun-tabilab/TabiBERT pipeline_tag: text-classification --- # TurkWeb-Edu Classifier V4 🇹🇷 Fast, accurate Turkish educational content classifier. Predicts educational quality scores (0-5) for Turkish web text. **This is the Turkish equivalent of [HuggingFaceFW/fineweb-edu-classifier](https://huggingface.co/HuggingFaceFW/fineweb-edu-classifier).** changed lr ## Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch model_name = "YsK-dev/TurkWeb-Edu-Classifier-V5" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) text = "Fotosentez, bitkilerin güneş ışığını kullanarak karbondioksit ve suyu glikoz ve oksijene dönüştürdüğü biyokimyasal bir süreçtir." inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=1024) with torch.no_grad(): score = model(**inputs).logits.squeeze().item() print(f"Score: {score:.2f}") print(f"Educational: {score >= 3}") ``` ## Model Details | Component | Details | |-----------|---------| | Base Model | `boun-tabilab/TabiBERT` | | Architecture | Encoder + Regression Head | | Training Data | [YsK-dev/TurkWeb-Edu-AnnotationsV3](https://huggingface.co/datasets/YsK-dev/TurkWeb-Edu-AnnotationsV3) (660K samples) | | Teacher | Qwen3-30B-A3B-Instruct-2507 | | Task | Regression (0-5 educational quality score) | | Language | Turkish (tur_Latn) | ## Evaluation | Metric | Value | |--------|-------| | Pearson | 0.8406999707221985 | | RMSE | 0.8725 | | MAE | 0.6240000128746033 | | F1 (edu≥3) | 0.7221 | | Exact Accuracy | 0.5152 | ## Scoring Rubric | Score | Meaning | |-------|---------| | 0 | Not Educational — Spam, ads, NSFW, navigation-only | | 1 | Low Quality — Personal chat, forum posts, low-quality news | | 2 | Medium — General culture, blog, opinion pieces | | 3 | Educational — Encyclopedic, how-to guides, concept explanations | | 4 | High Quality — Well-structured, high pedagogical value, technical | | 5 | Academic — Textbook quality, sourced, in-depth analysis | ## Recommended Threshold For filtering educational Turkish content, use `score >= 3` (following FineWeb-Edu methodology).