File size: 2,114 Bytes

a2fb2a3

# iPBL – Literary Genre Classification (HerBERT)

## Overview

This model implements the **literary genre classification** component of the iPBL (Bibliography of Polish Digital Culture) system developed at the Institute of Literary Research of the Polish Academy of Sciences.

It assigns domain-specific literary form categories to Polish web-based cultural texts.  
The model supports structured bibliographic description within a historically established classificatory framework derived from the Polish Literary Bibliography (PBL).

Unlike general-purpose genre classifiers, this model operates within a discipline-specific bibliographic regime.

---

## Task

Single-label multi-class text classification.

Each document is assigned one dominant literary genre category.

### Genres

- artykuł  
- esej  
- felieton  
- kult  
- nota  
- opowiadanie  
- proza  
- recenzja  
- wiersz  
- wpis blogowy  
- wspomnienie  
- wywiad  
- zgon  

---

## Base Model

`allegro/herbert-base-cased`

Architecture: `BertForSequenceClassification`

---

## Training Data

The model was trained on curated bibliographic records produced in everyday bibliographic practice within the iPBL project.

Raw samples: 17,731  
Final samples (after frequency filtering ≥ 100): 17,486  

Data split:

- 70% Training  
- 10% Validation  
- 20% Test  

The dataset reflects real-world class imbalance typical of web-native literary discourse.

---

## Performance (Test Set)

- Accuracy: **85.13%**
- Weighted F1-score: **0.85**

### High-performing genres

| Genre      | F1-score |
|------------|----------|
| wiersz     | 0.94 |
| wywiad     | 0.94 |
| recenzja   | 0.92 |
| artykuł    | 0.85 |

Lower performance is observed for structurally hybrid and low-resource genres (e.g., esej, nota).

---

## How to Use

### Standard Transformers Usage

```python
from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="darekpe79/Literary_Genre_Classification",
    tokenizer="darekpe79/Literary_Genre_Classification"
)

text = "Przykładowy tekst artykułu literackiego..."
classifier(text)