YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

iPBL – Literary Genre Classification (HerBERT)

Overview

This model implements the literary genre classification component of the iPBL (Bibliography of Polish Digital Culture) system developed at the Institute of Literary Research of the Polish Academy of Sciences.

It assigns domain-specific literary form categories to Polish web-based cultural texts.
The model supports structured bibliographic description within a historically established classificatory framework derived from the Polish Literary Bibliography (PBL).

Unlike general-purpose genre classifiers, this model operates within a discipline-specific bibliographic regime.


Task

Single-label multi-class text classification.

Each document is assigned one dominant literary genre category.

Genres

  • artykuł
  • esej
  • felieton
  • kult
  • nota
  • opowiadanie
  • proza
  • recenzja
  • wiersz
  • wpis blogowy
  • wspomnienie
  • wywiad
  • zgon

Base Model

allegro/herbert-base-cased

Architecture: BertForSequenceClassification


Training Data

The model was trained on curated bibliographic records produced in everyday bibliographic practice within the iPBL project.

Raw samples: 17,731
Final samples (after frequency filtering ≥ 100): 17,486

Data split:

  • 70% Training
  • 10% Validation
  • 20% Test

The dataset reflects real-world class imbalance typical of web-native literary discourse.


Performance (Test Set)

  • Accuracy: 85.13%
  • Weighted F1-score: 0.85

High-performing genres

Genre F1-score
wiersz 0.94
wywiad 0.94
recenzja 0.92
artykuł 0.85

Lower performance is observed for structurally hybrid and low-resource genres (e.g., esej, nota).


How to Use

Standard Transformers Usage

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="darekpe79/Literary_Genre_Classification",
    tokenizer="darekpe79/Literary_Genre_Classification"
)

text = "Przykładowy tekst artykułu literackiego..."
classifier(text)
Downloads last month
8
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support