DistilBERT Goodreads Genre Classifier

This model is a fine-tuned version of distilbert-base-cased on a subset of the UCSD Book Graph dataset. It classifies book reviews into 8 genres.

Model Details

  • Model Architecture: DistilBERT (Base Cased)
  • Task: Multi-class Text Classification
  • Classes (8):
    • children
    • comics_graphic
    • fantasy_paranormal
    • history_biography
    • mystery_thriller_crime
    • poetry
    • romance
    • young_adult

Performance

The model was evaluated on a held-out test set of 1,600 reviews (200 per genre).

Metric Score
Accuracy 60.75%
F1 Score (weighted) 60.83%
Loss 1.26

Note: Performance may vary slightly due to random sampling of the massive Goodreads dataset.

Usage

You can use this model directly with the Hugging Face pipeline:

from transformers import pipeline

# Load the classifier
classifier = pipeline("text-classification", model="kingkenche/distilbert-goodreads-genre-classifier")

# Make a prediction
text = "The detective followed the clues to the abandoned mansion, where a dark secret awaited."
result = classifier(text)

print(result)
# Output: [{'label': 'mystery_thriller_crime', 'score': 0.98}]

Training Procedure

  • Base Model: distilbert-base-cased
  • Optimizer: AdamW
  • Learning Rate: 5e-5
  • Batch Size: 10 (Train), 16 (Eval)
  • Epochs: 3
  • Max Sequence Length: 512 tokens

Deployment

This model is deployed as part of the MLOps Assignment 3, which includes a Dockerized evaluation pipeline.

Docker Usage

docker run -e HF_REPO_ID="kingkenche/distilbert-goodreads-genre-classifier" assignment3-eval
Downloads last month
18
Safetensors
Model size
65.8M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results