DistilBERT Goodreads Genre Classifier
This model is a fine-tuned version of distilbert-base-cased on a subset of the UCSD Book Graph dataset. It classifies book reviews into 8 genres.
Model Details
- Model Architecture: DistilBERT (Base Cased)
- Task: Multi-class Text Classification
- Classes (8):
childrencomics_graphicfantasy_paranormalhistory_biographymystery_thriller_crimepoetryromanceyoung_adult
Performance
The model was evaluated on a held-out test set of 1,600 reviews (200 per genre).
| Metric | Score |
|---|---|
| Accuracy | 60.75% |
| F1 Score (weighted) | 60.83% |
| Loss | 1.26 |
Note: Performance may vary slightly due to random sampling of the massive Goodreads dataset.
Usage
You can use this model directly with the Hugging Face pipeline:
from transformers import pipeline
# Load the classifier
classifier = pipeline("text-classification", model="kingkenche/distilbert-goodreads-genre-classifier")
# Make a prediction
text = "The detective followed the clues to the abandoned mansion, where a dark secret awaited."
result = classifier(text)
print(result)
# Output: [{'label': 'mystery_thriller_crime', 'score': 0.98}]
Training Procedure
- Base Model:
distilbert-base-cased - Optimizer: AdamW
- Learning Rate: 5e-5
- Batch Size: 10 (Train), 16 (Eval)
- Epochs: 3
- Max Sequence Length: 512 tokens
Deployment
This model is deployed as part of the MLOps Assignment 3, which includes a Dockerized evaluation pipeline.
Docker Usage
docker run -e HF_REPO_ID="kingkenche/distilbert-goodreads-genre-classifier" assignment3-eval
- Downloads last month
- 18
Evaluation results
- Accuracyself-reported0.608
- F1 Scoreself-reported0.608