darekpe79
/

Literary_Genre_Classification

Model card Files Files and versions

Literary_Genre_Classification / README.md

darekpe79's picture

README.md

a2fb2a3 verified 9 days ago

|

history blame contribute delete

2.11 kB

	# iPBL – Literary Genre Classification (HerBERT)

	## Overview

	This model implements the literary genre classification component of the iPBL (Bibliography of Polish Digital Culture) system developed at the Institute of Literary Research of the Polish Academy of Sciences.

	It assigns domain-specific literary form categories to Polish web-based cultural texts.
	The model supports structured bibliographic description within a historically established classificatory framework derived from the Polish Literary Bibliography (PBL).

	Unlike general-purpose genre classifiers, this model operates within a discipline-specific bibliographic regime.

	---

	## Task

	Single-label multi-class text classification.

	Each document is assigned one dominant literary genre category.

	### Genres

	- artykuł
	- esej
	- felieton
	- kult
	- nota
	- opowiadanie
	- proza
	- recenzja
	- wiersz
	- wpis blogowy
	- wspomnienie
	- wywiad
	- zgon

	---

	## Base Model

	`allegro/herbert-base-cased`

	Architecture: `BertForSequenceClassification`

	---

	## Training Data

	The model was trained on curated bibliographic records produced in everyday bibliographic practice within the iPBL project.

	Raw samples: 17,731
	Final samples (after frequency filtering ≥ 100): 17,486

	Data split:

	- 70% Training
	- 10% Validation
	- 20% Test

	The dataset reflects real-world class imbalance typical of web-native literary discourse.

	---

	## Performance (Test Set)

	- Accuracy: 85.13%
	- Weighted F1-score: 0.85

	### High-performing genres

	\| Genre \| F1-score \|
	\|------------\|----------\|
	\| wiersz \| 0.94 \|
	\| wywiad \| 0.94 \|
	\| recenzja \| 0.92 \|
	\| artykuł \| 0.85 \|

	Lower performance is observed for structurally hybrid and low-resource genres (e.g., esej, nota).

	---

	## How to Use

	### Standard Transformers Usage

	```python
	from transformers import pipeline

	classifier = pipeline(
	"text-classification",
	model="darekpe79/Literary_Genre_Classification",
	tokenizer="darekpe79/Literary_Genre_Classification"
	)

	text = "Przykładowy tekst artykułu literackiego..."
	classifier(text)