---
language:
- en
license: mit
library_name: transformers
tags:
- text-classification
- distilbert
- book-genre-classification
- mlops
datasets:
- custom
metrics:
- accuracy
- f1
pipeline_tag: text-classification
model-index:
- name: ML_OPS_ASSIGNMENT2
  results:
  - task:
      type: text-classification
      name: Text Classification
    metrics:
    - name: Accuracy
      type: accuracy
      value: 0.5831
    - name: F1 (weighted)
      type: f1
      value: 0.5810
---

# DistilBERT Book Genre Classifier

A fine-tuned **DistilBERT** model for classifying book reviews into 8 genres.

## Model Description

This model is based on `distilbert-base-cased` and was fine-tuned on the UCSD Goodreads book reviews dataset. It classifies a given book review text into one of 8 genres.

- **Model:** distilbert-base-cased
- **Task:** Multi-class text classification (8 genres)
- **Language:** English
- **License:** MIT

## Supported Genres

| Label | Genre |
|-------|-------|
| 0 | Children |
| 1 | Comics & Graphic |
| 2 | Fantasy & Paranormal |
| 3 | History & Biography |
| 4 | Mystery, Thriller & Crime |
| 5 | Poetry |
| 6 | Romance |
| 7 | Young Adult |

## Training Details

| Parameter | Value |
|-----------|-------|
| Base model | distilbert-base-cased |
| Epochs | 3 |
| Batch size (train) | 16 |
| Batch size (eval) | 32 |
| Learning rate | 3e-5 |
| Warmup steps | 100 |
| Weight decay | 0.01 |
| Max sequence length | 512 |
| Train samples | 6,400 |
| Test samples | 1,600 |
| Platform | Kaggle (GPU T4 x2) |
| Tracking | Weights & Biases |

## Results

| Metric | Score |
|--------|-------|
| Accuracy | 0.5831 |
| F1 Score (weighted) | 0.5810 |
| Eval Loss | 2.2847 |

### Per-Epoch Results

| Epoch | Training Loss | Validation Loss | Accuracy | F1 |
|-------|--------------|-----------------|----------|-----|
| 1 | 2.5710 | 2.5337 | 0.5525 | 0.5454 |
| 2 | 2.1273 | 2.2859 | 0.5981 | 0.5983 |
| 3 | 1.6126 | 2.2923 | 0.6094 | 0.6089 |

## How to Use

```python
from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="sureshbabugandla/ML_OPS_ASSIGNMENT2"
)

result = classifier("This book was a thrilling mystery with unexpected twists.")
print(result)
```

Or load the model and tokenizer separately:

```python
from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification

tokenizer = DistilBertTokenizerFast.from_pretrained("sureshbabugandla/ML_OPS_ASSIGNMENT2")
model = DistilBertForSequenceClassification.from_pretrained("sureshbabugandla/ML_OPS_ASSIGNMENT2")
```

## Dataset

The model was trained on the [UCSD Book Graph](https://mengtingwan.github.io/data/goodreads.html) dataset, which contains book reviews from Goodreads across multiple genres. 2,000 reviews were sampled from each of the 8 genres, split into 800 train and 200 test samples per genre.

## Developed By

- **Name:** Suresh Babu Gandla
- **Roll Number:** G25AIT2119

## Links

- **GitHub:** https://github.com/g25ait2119/MLOpsAssignment2
- **W&B Dashboard:** https://wandb.ai/g25ait2119-sureshbabu-gandla/mlops-assignment2
- **Kaggle Notebook:** https://www.kaggle.com/code/sureshbabugandla/mlops-a2-training