sureshbabugandla
/

ML_OPS_ASSIGNMENT2

Text Classification

book-genre-classification

Eval Results (legacy)

text-embeddings-inference

Model card Files Files and versions

ML_OPS_ASSIGNMENT2 / README.md

sureshbabugandla's picture

sureshbabugandla

updated readme file

92b3698 verified 8 days ago

|

history blame contribute delete

3.17 kB

	---
	language:
	- en
	license: mit
	library_name: transformers
	tags:
	- text-classification
	- distilbert
	- book-genre-classification
	- mlops
	datasets:
	- custom
	metrics:
	- accuracy
	- f1
	pipeline_tag: text-classification
	model-index:
	- name: ML_OPS_ASSIGNMENT2
	results:
	- task:
	type: text-classification
	name: Text Classification
	metrics:
	- name: Accuracy
	type: accuracy
	value: 0.5831
	- name: F1 (weighted)
	type: f1
	value: 0.5810
	---

	# DistilBERT Book Genre Classifier

	A fine-tuned DistilBERT model for classifying book reviews into 8 genres.

	## Model Description

	This model is based on `distilbert-base-cased` and was fine-tuned on the UCSD Goodreads book reviews dataset. It classifies a given book review text into one of 8 genres.

	- Model: distilbert-base-cased
	- Task: Multi-class text classification (8 genres)
	- Language: English
	- License: MIT

	## Supported Genres

	\| Label \| Genre \|
	\|-------\|-------\|
	\| 0 \| Children \|
	\| 1 \| Comics & Graphic \|
	\| 2 \| Fantasy & Paranormal \|
	\| 3 \| History & Biography \|
	\| 4 \| Mystery, Thriller & Crime \|
	\| 5 \| Poetry \|
	\| 6 \| Romance \|
	\| 7 \| Young Adult \|

	## Training Details

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Base model \| distilbert-base-cased \|
	\| Epochs \| 3 \|
	\| Batch size (train) \| 16 \|
	\| Batch size (eval) \| 32 \|
	\| Learning rate \| 3e-5 \|
	\| Warmup steps \| 100 \|
	\| Weight decay \| 0.01 \|
	\| Max sequence length \| 512 \|
	\| Train samples \| 6,400 \|
	\| Test samples \| 1,600 \|
	\| Platform \| Kaggle (GPU T4 x2) \|
	\| Tracking \| Weights & Biases \|

	## Results

	\| Metric \| Score \|
	\|--------\|-------\|
	\| Accuracy \| 0.5831 \|
	\| F1 Score (weighted) \| 0.5810 \|
	\| Eval Loss \| 2.2847 \|

	### Per-Epoch Results

	\| Epoch \| Training Loss \| Validation Loss \| Accuracy \| F1 \|
	\|-------\|--------------\|-----------------\|----------\|-----\|
	\| 1 \| 2.5710 \| 2.5337 \| 0.5525 \| 0.5454 \|
	\| 2 \| 2.1273 \| 2.2859 \| 0.5981 \| 0.5983 \|
	\| 3 \| 1.6126 \| 2.2923 \| 0.6094 \| 0.6089 \|

	## How to Use

	```python
	from transformers import pipeline

	classifier = pipeline(
	"text-classification",
	model="sureshbabugandla/ML_OPS_ASSIGNMENT2"
	)

	result = classifier("This book was a thrilling mystery with unexpected twists.")
	print(result)
	```

	Or load the model and tokenizer separately:

	```python
	from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification

	tokenizer = DistilBertTokenizerFast.from_pretrained("sureshbabugandla/ML_OPS_ASSIGNMENT2")
	model = DistilBertForSequenceClassification.from_pretrained("sureshbabugandla/ML_OPS_ASSIGNMENT2")
	```

	## Dataset

	The model was trained on the [UCSD Book Graph](https://mengtingwan.github.io/data/goodreads.html) dataset, which contains book reviews from Goodreads across multiple genres. 2,000 reviews were sampled from each of the 8 genres, split into 800 train and 200 test samples per genre.

	## Developed By

	- Name: Suresh Babu Gandla
	- Roll Number: G25AIT2119

	## Links

	- GitHub: https://github.com/g25ait2119/MLOpsAssignment2
	- W&B Dashboard: https://wandb.ai/g25ait2119-sureshbabu-gandla/mlops-assignment2
	- Kaggle Notebook: https://www.kaggle.com/code/sureshbabugandla/mlops-a2-training