darekpe79's picture
README.md
a2fb2a3 verified
# iPBL – Literary Genre Classification (HerBERT)
## Overview
This model implements the **literary genre classification** component of the iPBL (Bibliography of Polish Digital Culture) system developed at the Institute of Literary Research of the Polish Academy of Sciences.
It assigns domain-specific literary form categories to Polish web-based cultural texts.
The model supports structured bibliographic description within a historically established classificatory framework derived from the Polish Literary Bibliography (PBL).
Unlike general-purpose genre classifiers, this model operates within a discipline-specific bibliographic regime.
---
## Task
Single-label multi-class text classification.
Each document is assigned one dominant literary genre category.
### Genres
- artykuł
- esej
- felieton
- kult
- nota
- opowiadanie
- proza
- recenzja
- wiersz
- wpis blogowy
- wspomnienie
- wywiad
- zgon
---
## Base Model
`allegro/herbert-base-cased`
Architecture: `BertForSequenceClassification`
---
## Training Data
The model was trained on curated bibliographic records produced in everyday bibliographic practice within the iPBL project.
Raw samples: 17,731
Final samples (after frequency filtering ≥ 100): 17,486
Data split:
- 70% Training
- 10% Validation
- 20% Test
The dataset reflects real-world class imbalance typical of web-native literary discourse.
---
## Performance (Test Set)
- Accuracy: **85.13%**
- Weighted F1-score: **0.85**
### High-performing genres
| Genre | F1-score |
|------------|----------|
| wiersz | 0.94 |
| wywiad | 0.94 |
| recenzja | 0.92 |
| artykuł | 0.85 |
Lower performance is observed for structurally hybrid and low-resource genres (e.g., esej, nota).
---
## How to Use
### Standard Transformers Usage
```python
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="darekpe79/Literary_Genre_Classification",
tokenizer="darekpe79/Literary_Genre_Classification"
)
text = "Przykładowy tekst artykułu literackiego..."
classifier(text)