darekpe79 commited on
Commit
a2fb2a3
·
verified ·
1 Parent(s): eae6b40

README.md

Browse files
Files changed (1) hide show
  1. README.md +95 -0
README.md ADDED
@@ -0,0 +1,95 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # iPBL – Literary Genre Classification (HerBERT)
2
+
3
+ ## Overview
4
+
5
+ This model implements the **literary genre classification** component of the iPBL (Bibliography of Polish Digital Culture) system developed at the Institute of Literary Research of the Polish Academy of Sciences.
6
+
7
+ It assigns domain-specific literary form categories to Polish web-based cultural texts.
8
+ The model supports structured bibliographic description within a historically established classificatory framework derived from the Polish Literary Bibliography (PBL).
9
+
10
+ Unlike general-purpose genre classifiers, this model operates within a discipline-specific bibliographic regime.
11
+
12
+ ---
13
+
14
+ ## Task
15
+
16
+ Single-label multi-class text classification.
17
+
18
+ Each document is assigned one dominant literary genre category.
19
+
20
+ ### Genres
21
+
22
+ - artykuł
23
+ - esej
24
+ - felieton
25
+ - kult
26
+ - nota
27
+ - opowiadanie
28
+ - proza
29
+ - recenzja
30
+ - wiersz
31
+ - wpis blogowy
32
+ - wspomnienie
33
+ - wywiad
34
+ - zgon
35
+
36
+ ---
37
+
38
+ ## Base Model
39
+
40
+ `allegro/herbert-base-cased`
41
+
42
+ Architecture: `BertForSequenceClassification`
43
+
44
+ ---
45
+
46
+ ## Training Data
47
+
48
+ The model was trained on curated bibliographic records produced in everyday bibliographic practice within the iPBL project.
49
+
50
+ Raw samples: 17,731
51
+ Final samples (after frequency filtering ≥ 100): 17,486
52
+
53
+ Data split:
54
+
55
+ - 70% Training
56
+ - 10% Validation
57
+ - 20% Test
58
+
59
+ The dataset reflects real-world class imbalance typical of web-native literary discourse.
60
+
61
+ ---
62
+
63
+ ## Performance (Test Set)
64
+
65
+ - Accuracy: **85.13%**
66
+ - Weighted F1-score: **0.85**
67
+
68
+ ### High-performing genres
69
+
70
+ | Genre | F1-score |
71
+ |------------|----------|
72
+ | wiersz | 0.94 |
73
+ | wywiad | 0.94 |
74
+ | recenzja | 0.92 |
75
+ | artykuł | 0.85 |
76
+
77
+ Lower performance is observed for structurally hybrid and low-resource genres (e.g., esej, nota).
78
+
79
+ ---
80
+
81
+ ## How to Use
82
+
83
+ ### Standard Transformers Usage
84
+
85
+ ```python
86
+ from transformers import pipeline
87
+
88
+ classifier = pipeline(
89
+ "text-classification",
90
+ model="darekpe79/Literary_Genre_Classification",
91
+ tokenizer="darekpe79/Literary_Genre_Classification"
92
+ )
93
+
94
+ text = "Przykładowy tekst artykułu literackiego..."
95
+ classifier(text)