iPBL – Literary Genre Classification (HerBERT)
Overview
This model implements the literary genre classification component of the iPBL (Bibliography of Polish Digital Culture) system developed at the Institute of Literary Research of the Polish Academy of Sciences.
It assigns domain-specific literary form categories to Polish web-based cultural texts.
The model supports structured bibliographic description within a historically established classificatory framework derived from the Polish Literary Bibliography (PBL).
Unlike general-purpose genre classifiers, this model operates within a discipline-specific bibliographic regime.
Task
Single-label multi-class text classification.
Each document is assigned one dominant literary genre category.
Genres
- artykuł
- esej
- felieton
- kult
- nota
- opowiadanie
- proza
- recenzja
- wiersz
- wpis blogowy
- wspomnienie
- wywiad
- zgon
Base Model
allegro/herbert-base-cased
Architecture: BertForSequenceClassification
Training Data
The model was trained on curated bibliographic records produced in everyday bibliographic practice within the iPBL project.
Raw samples: 17,731
Final samples (after frequency filtering ≥ 100): 17,486
Data split:
- 70% Training
- 10% Validation
- 20% Test
The dataset reflects real-world class imbalance typical of web-native literary discourse.
Performance (Test Set)
- Accuracy: 85.13%
- Weighted F1-score: 0.85
High-performing genres
| Genre | F1-score |
|---|---|
| wiersz | 0.94 |
| wywiad | 0.94 |
| recenzja | 0.92 |
| artykuł | 0.85 |
Lower performance is observed for structurally hybrid and low-resource genres (e.g., esej, nota).
How to Use
Standard Transformers Usage
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="darekpe79/Literary_Genre_Classification",
tokenizer="darekpe79/Literary_Genre_Classification"
)
text = "Przykładowy tekst artykułu literackiego..."
classifier(text)
- Downloads last month
- 8