darekpe79's picture
README.md
a2fb2a3 verified

iPBL – Literary Genre Classification (HerBERT)

Overview

This model implements the literary genre classification component of the iPBL (Bibliography of Polish Digital Culture) system developed at the Institute of Literary Research of the Polish Academy of Sciences.

It assigns domain-specific literary form categories to Polish web-based cultural texts.
The model supports structured bibliographic description within a historically established classificatory framework derived from the Polish Literary Bibliography (PBL).

Unlike general-purpose genre classifiers, this model operates within a discipline-specific bibliographic regime.


Task

Single-label multi-class text classification.

Each document is assigned one dominant literary genre category.

Genres

  • artykuł
  • esej
  • felieton
  • kult
  • nota
  • opowiadanie
  • proza
  • recenzja
  • wiersz
  • wpis blogowy
  • wspomnienie
  • wywiad
  • zgon

Base Model

allegro/herbert-base-cased

Architecture: BertForSequenceClassification


Training Data

The model was trained on curated bibliographic records produced in everyday bibliographic practice within the iPBL project.

Raw samples: 17,731
Final samples (after frequency filtering ≥ 100): 17,486

Data split:

  • 70% Training
  • 10% Validation
  • 20% Test

The dataset reflects real-world class imbalance typical of web-native literary discourse.


Performance (Test Set)

  • Accuracy: 85.13%
  • Weighted F1-score: 0.85

High-performing genres

Genre F1-score
wiersz 0.94
wywiad 0.94
recenzja 0.92
artykuł 0.85

Lower performance is observed for structurally hybrid and low-resource genres (e.g., esej, nota).


How to Use

Standard Transformers Usage

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="darekpe79/Literary_Genre_Classification",
    tokenizer="darekpe79/Literary_Genre_Classification"
)

text = "Przykładowy tekst artykułu literackiego..."
classifier(text)