# iPBL – Literary Genre Classification (HerBERT) ## Overview This model implements the **literary genre classification** component of the iPBL (Bibliography of Polish Digital Culture) system developed at the Institute of Literary Research of the Polish Academy of Sciences. It assigns domain-specific literary form categories to Polish web-based cultural texts. The model supports structured bibliographic description within a historically established classificatory framework derived from the Polish Literary Bibliography (PBL). Unlike general-purpose genre classifiers, this model operates within a discipline-specific bibliographic regime. --- ## Task Single-label multi-class text classification. Each document is assigned one dominant literary genre category. ### Genres - artykuł - esej - felieton - kult - nota - opowiadanie - proza - recenzja - wiersz - wpis blogowy - wspomnienie - wywiad - zgon --- ## Base Model `allegro/herbert-base-cased` Architecture: `BertForSequenceClassification` --- ## Training Data The model was trained on curated bibliographic records produced in everyday bibliographic practice within the iPBL project. Raw samples: 17,731 Final samples (after frequency filtering ≥ 100): 17,486 Data split: - 70% Training - 10% Validation - 20% Test The dataset reflects real-world class imbalance typical of web-native literary discourse. --- ## Performance (Test Set) - Accuracy: **85.13%** - Weighted F1-score: **0.85** ### High-performing genres | Genre | F1-score | |------------|----------| | wiersz | 0.94 | | wywiad | 0.94 | | recenzja | 0.92 | | artykuł | 0.85 | Lower performance is observed for structurally hybrid and low-resource genres (e.g., esej, nota). --- ## How to Use ### Standard Transformers Usage ```python from transformers import pipeline classifier = pipeline( "text-classification", model="darekpe79/Literary_Genre_Classification", tokenizer="darekpe79/Literary_Genre_Classification" ) text = "Przykładowy tekst artykułu literackiego..." classifier(text)