File size: 2,114 Bytes
a2fb2a3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
# iPBL – Literary Genre Classification (HerBERT)

## Overview

This model implements the **literary genre classification** component of the iPBL (Bibliography of Polish Digital Culture) system developed at the Institute of Literary Research of the Polish Academy of Sciences.

It assigns domain-specific literary form categories to Polish web-based cultural texts.  
The model supports structured bibliographic description within a historically established classificatory framework derived from the Polish Literary Bibliography (PBL).

Unlike general-purpose genre classifiers, this model operates within a discipline-specific bibliographic regime.

---

## Task

Single-label multi-class text classification.

Each document is assigned one dominant literary genre category.

### Genres

- artykuł  
- esej  
- felieton  
- kult  
- nota  
- opowiadanie  
- proza  
- recenzja  
- wiersz  
- wpis blogowy  
- wspomnienie  
- wywiad  
- zgon  

---

## Base Model

`allegro/herbert-base-cased`

Architecture: `BertForSequenceClassification`

---

## Training Data

The model was trained on curated bibliographic records produced in everyday bibliographic practice within the iPBL project.

Raw samples: 17,731  
Final samples (after frequency filtering ≥ 100): 17,486  

Data split:

- 70% Training  
- 10% Validation  
- 20% Test  

The dataset reflects real-world class imbalance typical of web-native literary discourse.

---

## Performance (Test Set)

- Accuracy: **85.13%**
- Weighted F1-score: **0.85**

### High-performing genres

| Genre      | F1-score |
|------------|----------|
| wiersz     | 0.94 |
| wywiad     | 0.94 |
| recenzja   | 0.92 |
| artykuł    | 0.85 |

Lower performance is observed for structurally hybrid and low-resource genres (e.g., esej, nota).

---

## How to Use

### Standard Transformers Usage

```python
from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="darekpe79/Literary_Genre_Classification",
    tokenizer="darekpe79/Literary_Genre_Classification"
)

text = "Przykładowy tekst artykułu literackiego..."
classifier(text)