| # iPBL – Literary Genre Classification (HerBERT) | |
| ## Overview | |
| This model implements the **literary genre classification** component of the iPBL (Bibliography of Polish Digital Culture) system developed at the Institute of Literary Research of the Polish Academy of Sciences. | |
| It assigns domain-specific literary form categories to Polish web-based cultural texts. | |
| The model supports structured bibliographic description within a historically established classificatory framework derived from the Polish Literary Bibliography (PBL). | |
| Unlike general-purpose genre classifiers, this model operates within a discipline-specific bibliographic regime. | |
| --- | |
| ## Task | |
| Single-label multi-class text classification. | |
| Each document is assigned one dominant literary genre category. | |
| ### Genres | |
| - artykuł | |
| - esej | |
| - felieton | |
| - kult | |
| - nota | |
| - opowiadanie | |
| - proza | |
| - recenzja | |
| - wiersz | |
| - wpis blogowy | |
| - wspomnienie | |
| - wywiad | |
| - zgon | |
| --- | |
| ## Base Model | |
| `allegro/herbert-base-cased` | |
| Architecture: `BertForSequenceClassification` | |
| --- | |
| ## Training Data | |
| The model was trained on curated bibliographic records produced in everyday bibliographic practice within the iPBL project. | |
| Raw samples: 17,731 | |
| Final samples (after frequency filtering ≥ 100): 17,486 | |
| Data split: | |
| - 70% Training | |
| - 10% Validation | |
| - 20% Test | |
| The dataset reflects real-world class imbalance typical of web-native literary discourse. | |
| --- | |
| ## Performance (Test Set) | |
| - Accuracy: **85.13%** | |
| - Weighted F1-score: **0.85** | |
| ### High-performing genres | |
| | Genre | F1-score | | |
| |------------|----------| | |
| | wiersz | 0.94 | | |
| | wywiad | 0.94 | | |
| | recenzja | 0.92 | | |
| | artykuł | 0.85 | | |
| Lower performance is observed for structurally hybrid and low-resource genres (e.g., esej, nota). | |
| --- | |
| ## How to Use | |
| ### Standard Transformers Usage | |
| ```python | |
| from transformers import pipeline | |
| classifier = pipeline( | |
| "text-classification", | |
| model="darekpe79/Literary_Genre_Classification", | |
| tokenizer="darekpe79/Literary_Genre_Classification" | |
| ) | |
| text = "Przykładowy tekst artykułu literackiego..." | |
| classifier(text) |