darekpe79
/

Literary_Genre_Classification

Model card Files Files and versions

darekpe79 commited on 9 days ago

Commit

a2fb2a3

·

verified ·

1 Parent(s): eae6b40

README.md

Files changed (1) hide show

README.md +95 -0

README.md ADDED Viewed

	@@ -0,0 +1,95 @@

+# iPBL – Literary Genre Classification (HerBERT)
+## Overview
+This model implements the **literary genre classification** component of the iPBL (Bibliography of Polish Digital Culture) system developed at the Institute of Literary Research of the Polish Academy of Sciences.
+It assigns domain-specific literary form categories to Polish web-based cultural texts.
+The model supports structured bibliographic description within a historically established classificatory framework derived from the Polish Literary Bibliography (PBL).
+Unlike general-purpose genre classifiers, this model operates within a discipline-specific bibliographic regime.
+---
+## Task
+Single-label multi-class text classification.
+Each document is assigned one dominant literary genre category.
+### Genres
+- artykuł
+- esej
+- felieton
+- kult
+- nota
+- opowiadanie
+- proza
+- recenzja
+- wiersz
+- wpis blogowy
+- wspomnienie
+- wywiad
+- zgon
+---
+## Base Model
+`allegro/herbert-base-cased`
+Architecture: `BertForSequenceClassification`
+---
+## Training Data
+The model was trained on curated bibliographic records produced in everyday bibliographic practice within the iPBL project.
+Raw samples: 17,731
+Final samples (after frequency filtering ≥ 100): 17,486
+Data split:
+- 70% Training
+- 10% Validation
+- 20% Test
+The dataset reflects real-world class imbalance typical of web-native literary discourse.
+---
+## Performance (Test Set)
+- Accuracy: **85.13%**
+- Weighted F1-score: **0.85**
+### High-performing genres
+| Genre      | F1-score |
+|------------|----------|
+| wiersz     | 0.94 |
+| wywiad     | 0.94 |
+| recenzja   | 0.92 |
+| artykuł    | 0.85 |
+Lower performance is observed for structurally hybrid and low-resource genres (e.g., esej, nota).
+---
+## How to Use
+### Standard Transformers Usage
+```python
+from transformers import pipeline
+classifier = pipeline(
+    "text-classification",
+    model="darekpe79/Literary_Genre_Classification",
+    tokenizer="darekpe79/Literary_Genre_Classification"
+)
+text = "Przykładowy tekst artykułu literackiego..."
+classifier(text)