ML_OPS_ASSIGNMENT2 / README.md
sureshbabugandla's picture
updated readme file
92b3698 verified
metadata
language:
  - en
license: mit
library_name: transformers
tags:
  - text-classification
  - distilbert
  - book-genre-classification
  - mlops
datasets:
  - custom
metrics:
  - accuracy
  - f1
pipeline_tag: text-classification
model-index:
  - name: ML_OPS_ASSIGNMENT2
    results:
      - task:
          type: text-classification
          name: Text Classification
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.5831
          - name: F1 (weighted)
            type: f1
            value: 0.581

DistilBERT Book Genre Classifier

A fine-tuned DistilBERT model for classifying book reviews into 8 genres.

Model Description

This model is based on distilbert-base-cased and was fine-tuned on the UCSD Goodreads book reviews dataset. It classifies a given book review text into one of 8 genres.

  • Model: distilbert-base-cased
  • Task: Multi-class text classification (8 genres)
  • Language: English
  • License: MIT

Supported Genres

Label Genre
0 Children
1 Comics & Graphic
2 Fantasy & Paranormal
3 History & Biography
4 Mystery, Thriller & Crime
5 Poetry
6 Romance
7 Young Adult

Training Details

Parameter Value
Base model distilbert-base-cased
Epochs 3
Batch size (train) 16
Batch size (eval) 32
Learning rate 3e-5
Warmup steps 100
Weight decay 0.01
Max sequence length 512
Train samples 6,400
Test samples 1,600
Platform Kaggle (GPU T4 x2)
Tracking Weights & Biases

Results

Metric Score
Accuracy 0.5831
F1 Score (weighted) 0.5810
Eval Loss 2.2847

Per-Epoch Results

Epoch Training Loss Validation Loss Accuracy F1
1 2.5710 2.5337 0.5525 0.5454
2 2.1273 2.2859 0.5981 0.5983
3 1.6126 2.2923 0.6094 0.6089

How to Use

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="sureshbabugandla/ML_OPS_ASSIGNMENT2"
)

result = classifier("This book was a thrilling mystery with unexpected twists.")
print(result)

Or load the model and tokenizer separately:

from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification

tokenizer = DistilBertTokenizerFast.from_pretrained("sureshbabugandla/ML_OPS_ASSIGNMENT2")
model = DistilBertForSequenceClassification.from_pretrained("sureshbabugandla/ML_OPS_ASSIGNMENT2")

Dataset

The model was trained on the UCSD Book Graph dataset, which contains book reviews from Goodreads across multiple genres. 2,000 reviews were sampled from each of the 8 genres, split into 800 train and 200 test samples per genre.

Developed By

  • Name: Suresh Babu Gandla
  • Roll Number: G25AIT2119

Links