Text Classification
Transformers
Safetensors
English
distilbert
book-genre-classification
mlops
Eval Results (legacy)
text-embeddings-inference
Instructions to use sureshbabugandla/ML_OPS_ASSIGNMENT2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use sureshbabugandla/ML_OPS_ASSIGNMENT2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="sureshbabugandla/ML_OPS_ASSIGNMENT2")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("sureshbabugandla/ML_OPS_ASSIGNMENT2") model = AutoModelForSequenceClassification.from_pretrained("sureshbabugandla/ML_OPS_ASSIGNMENT2") - Notebooks
- Google Colab
- Kaggle
DistilBERT Book Genre Classifier
A fine-tuned DistilBERT model for classifying book reviews into 8 genres.
Model Description
This model is based on distilbert-base-cased and was fine-tuned on the UCSD Goodreads book reviews dataset. It classifies a given book review text into one of 8 genres.
- Model: distilbert-base-cased
- Task: Multi-class text classification (8 genres)
- Language: English
- License: MIT
Supported Genres
| Label | Genre |
|---|---|
| 0 | Children |
| 1 | Comics & Graphic |
| 2 | Fantasy & Paranormal |
| 3 | History & Biography |
| 4 | Mystery, Thriller & Crime |
| 5 | Poetry |
| 6 | Romance |
| 7 | Young Adult |
Training Details
| Parameter | Value |
|---|---|
| Base model | distilbert-base-cased |
| Epochs | 3 |
| Batch size (train) | 16 |
| Batch size (eval) | 32 |
| Learning rate | 3e-5 |
| Warmup steps | 100 |
| Weight decay | 0.01 |
| Max sequence length | 512 |
| Train samples | 6,400 |
| Test samples | 1,600 |
| Platform | Kaggle (GPU T4 x2) |
| Tracking | Weights & Biases |
Results
| Metric | Score |
|---|---|
| Accuracy | 0.5831 |
| F1 Score (weighted) | 0.5810 |
| Eval Loss | 2.2847 |
Per-Epoch Results
| Epoch | Training Loss | Validation Loss | Accuracy | F1 |
|---|---|---|---|---|
| 1 | 2.5710 | 2.5337 | 0.5525 | 0.5454 |
| 2 | 2.1273 | 2.2859 | 0.5981 | 0.5983 |
| 3 | 1.6126 | 2.2923 | 0.6094 | 0.6089 |
How to Use
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="sureshbabugandla/ML_OPS_ASSIGNMENT2"
)
result = classifier("This book was a thrilling mystery with unexpected twists.")
print(result)
Or load the model and tokenizer separately:
from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification
tokenizer = DistilBertTokenizerFast.from_pretrained("sureshbabugandla/ML_OPS_ASSIGNMENT2")
model = DistilBertForSequenceClassification.from_pretrained("sureshbabugandla/ML_OPS_ASSIGNMENT2")
Dataset
The model was trained on the UCSD Book Graph dataset, which contains book reviews from Goodreads across multiple genres. 2,000 reviews were sampled from each of the 8 genres, split into 800 train and 200 test samples per genre.
Developed By
- Name: Suresh Babu Gandla
- Roll Number: G25AIT2119
Links
- Downloads last month
- 22
Evaluation results
- Accuracyself-reported0.583
- F1 (weighted)self-reported0.581