DistilBERT Goodreads Genre Classifier

This model is a fine-tuned version of distilbert-base-cased on a subset of the UCSD Book Graph dataset. It classifies book reviews into 8 genres.

Model Details

Model Architecture: DistilBERT (Base Cased)
Task: Multi-class Text Classification
Classes (8):
- children
- comics_graphic
- fantasy_paranormal
- history_biography
- mystery_thriller_crime
- poetry
- romance
- young_adult

Performance

The model was evaluated on a held-out test set of 1,600 reviews (200 per genre).

Metric	Score
Accuracy	60.75%
F1 Score (weighted)	60.83%
Loss	1.26

Note: Performance may vary slightly due to random sampling of the massive Goodreads dataset.

Usage

You can use this model directly with the Hugging Face pipeline:

from transformers import pipeline

# Load the classifier
classifier = pipeline("text-classification", model="kingkenche/distilbert-goodreads-genre-classifier")

# Make a prediction
text = "The detective followed the clues to the abandoned mansion, where a dark secret awaited."
result = classifier(text)

print(result)
# Output: [{'label': 'mystery_thriller_crime', 'score': 0.98}]

Training Procedure

Base Model: distilbert-base-cased
Optimizer: AdamW
Learning Rate: 5e-5
Batch Size: 10 (Train), 16 (Eval)
Epochs: 3
Max Sequence Length: 512 tokens

Deployment

This model is deployed as part of the MLOps Assignment 3, which includes a Dockerized evaluation pipeline.

Docker Usage

docker run -e HF_REPO_ID="kingkenche/distilbert-goodreads-genre-classifier" assignment3-eval

Downloads last month: 1

Safetensors

Model size

65.8M params

Tensor type

F32

Evaluation results

Accuracy
self-reported

0.608
F1 Score
self-reported

0.608