ML_OPS_ASSIGNMENT2 / README.md
sureshbabugandla's picture
updated readme file
92b3698 verified
---
language:
- en
license: mit
library_name: transformers
tags:
- text-classification
- distilbert
- book-genre-classification
- mlops
datasets:
- custom
metrics:
- accuracy
- f1
pipeline_tag: text-classification
model-index:
- name: ML_OPS_ASSIGNMENT2
results:
- task:
type: text-classification
name: Text Classification
metrics:
- name: Accuracy
type: accuracy
value: 0.5831
- name: F1 (weighted)
type: f1
value: 0.5810
---
# DistilBERT Book Genre Classifier
A fine-tuned **DistilBERT** model for classifying book reviews into 8 genres.
## Model Description
This model is based on `distilbert-base-cased` and was fine-tuned on the UCSD Goodreads book reviews dataset. It classifies a given book review text into one of 8 genres.
- **Model:** distilbert-base-cased
- **Task:** Multi-class text classification (8 genres)
- **Language:** English
- **License:** MIT
## Supported Genres
| Label | Genre |
|-------|-------|
| 0 | Children |
| 1 | Comics & Graphic |
| 2 | Fantasy & Paranormal |
| 3 | History & Biography |
| 4 | Mystery, Thriller & Crime |
| 5 | Poetry |
| 6 | Romance |
| 7 | Young Adult |
## Training Details
| Parameter | Value |
|-----------|-------|
| Base model | distilbert-base-cased |
| Epochs | 3 |
| Batch size (train) | 16 |
| Batch size (eval) | 32 |
| Learning rate | 3e-5 |
| Warmup steps | 100 |
| Weight decay | 0.01 |
| Max sequence length | 512 |
| Train samples | 6,400 |
| Test samples | 1,600 |
| Platform | Kaggle (GPU T4 x2) |
| Tracking | Weights & Biases |
## Results
| Metric | Score |
|--------|-------|
| Accuracy | 0.5831 |
| F1 Score (weighted) | 0.5810 |
| Eval Loss | 2.2847 |
### Per-Epoch Results
| Epoch | Training Loss | Validation Loss | Accuracy | F1 |
|-------|--------------|-----------------|----------|-----|
| 1 | 2.5710 | 2.5337 | 0.5525 | 0.5454 |
| 2 | 2.1273 | 2.2859 | 0.5981 | 0.5983 |
| 3 | 1.6126 | 2.2923 | 0.6094 | 0.6089 |
## How to Use
```python
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="sureshbabugandla/ML_OPS_ASSIGNMENT2"
)
result = classifier("This book was a thrilling mystery with unexpected twists.")
print(result)
```
Or load the model and tokenizer separately:
```python
from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification
tokenizer = DistilBertTokenizerFast.from_pretrained("sureshbabugandla/ML_OPS_ASSIGNMENT2")
model = DistilBertForSequenceClassification.from_pretrained("sureshbabugandla/ML_OPS_ASSIGNMENT2")
```
## Dataset
The model was trained on the [UCSD Book Graph](https://mengtingwan.github.io/data/goodreads.html) dataset, which contains book reviews from Goodreads across multiple genres. 2,000 reviews were sampled from each of the 8 genres, split into 800 train and 200 test samples per genre.
## Developed By
- **Name:** Suresh Babu Gandla
- **Roll Number:** G25AIT2119
## Links
- **GitHub:** https://github.com/g25ait2119/MLOpsAssignment2
- **W&B Dashboard:** https://wandb.ai/g25ait2119-sureshbabu-gandla/mlops-assignment2
- **Kaggle Notebook:** https://www.kaggle.com/code/sureshbabugandla/mlops-a2-training