kingkenche's picture
Upload README.md with huggingface_hub
817a788 verified
metadata
language: en
license: apache-2.0
tags:
  - text-classification
  - pytorch
  - huggingface
  - goodreads
  - genre-classification
datasets:
  - ucsd_book_graph
metrics:
  - accuracy
  - f1
model-index:
  - name: distilbert-goodreads-genre-classifier
    results:
      - task:
          name: Text Classification
          type: text-classification
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.6075
          - name: F1 Score
            type: f1
            value: 0.6083

DistilBERT Goodreads Genre Classifier

This model is a fine-tuned version of distilbert-base-cased on a subset of the UCSD Book Graph dataset. It classifies book reviews into 8 genres.

Model Details

  • Model Architecture: DistilBERT (Base Cased)
  • Task: Multi-class Text Classification
  • Classes (8):
    • children
    • comics_graphic
    • fantasy_paranormal
    • history_biography
    • mystery_thriller_crime
    • poetry
    • romance
    • young_adult

Performance

The model was evaluated on a held-out test set of 1,600 reviews (200 per genre).

Metric Score
Accuracy 60.75%
F1 Score (weighted) 60.83%
Loss 1.26

Note: Performance may vary slightly due to random sampling of the massive Goodreads dataset.

Usage

You can use this model directly with the Hugging Face pipeline:

from transformers import pipeline

# Load the classifier
classifier = pipeline("text-classification", model="kingkenche/distilbert-goodreads-genre-classifier")

# Make a prediction
text = "The detective followed the clues to the abandoned mansion, where a dark secret awaited."
result = classifier(text)

print(result)
# Output: [{'label': 'mystery_thriller_crime', 'score': 0.98}]

Training Procedure

  • Base Model: distilbert-base-cased
  • Optimizer: AdamW
  • Learning Rate: 5e-5
  • Batch Size: 10 (Train), 16 (Eval)
  • Epochs: 3
  • Max Sequence Length: 512 tokens

Deployment

This model is deployed as part of the MLOps Assignment 3, which includes a Dockerized evaluation pipeline.

Docker Usage

docker run -e HF_REPO_ID="kingkenche/distilbert-goodreads-genre-classifier" assignment3-eval