Text Classification
Transformers
Safetensors
English
distilbert
book-genre-classification
mlops
Eval Results (legacy)
text-embeddings-inference
Instructions to use sureshbabugandla/ML_OPS_ASSIGNMENT2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use sureshbabugandla/ML_OPS_ASSIGNMENT2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="sureshbabugandla/ML_OPS_ASSIGNMENT2")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("sureshbabugandla/ML_OPS_ASSIGNMENT2") model = AutoModelForSequenceClassification.from_pretrained("sureshbabugandla/ML_OPS_ASSIGNMENT2") - Notebooks
- Google Colab
- Kaggle
| language: | |
| - en | |
| license: mit | |
| library_name: transformers | |
| tags: | |
| - text-classification | |
| - distilbert | |
| - book-genre-classification | |
| - mlops | |
| datasets: | |
| - custom | |
| metrics: | |
| - accuracy | |
| - f1 | |
| pipeline_tag: text-classification | |
| model-index: | |
| - name: ML_OPS_ASSIGNMENT2 | |
| results: | |
| - task: | |
| type: text-classification | |
| name: Text Classification | |
| metrics: | |
| - name: Accuracy | |
| type: accuracy | |
| value: 0.5831 | |
| - name: F1 (weighted) | |
| type: f1 | |
| value: 0.5810 | |
| # DistilBERT Book Genre Classifier | |
| A fine-tuned **DistilBERT** model for classifying book reviews into 8 genres. | |
| ## Model Description | |
| This model is based on `distilbert-base-cased` and was fine-tuned on the UCSD Goodreads book reviews dataset. It classifies a given book review text into one of 8 genres. | |
| - **Model:** distilbert-base-cased | |
| - **Task:** Multi-class text classification (8 genres) | |
| - **Language:** English | |
| - **License:** MIT | |
| ## Supported Genres | |
| | Label | Genre | | |
| |-------|-------| | |
| | 0 | Children | | |
| | 1 | Comics & Graphic | | |
| | 2 | Fantasy & Paranormal | | |
| | 3 | History & Biography | | |
| | 4 | Mystery, Thriller & Crime | | |
| | 5 | Poetry | | |
| | 6 | Romance | | |
| | 7 | Young Adult | | |
| ## Training Details | |
| | Parameter | Value | | |
| |-----------|-------| | |
| | Base model | distilbert-base-cased | | |
| | Epochs | 3 | | |
| | Batch size (train) | 16 | | |
| | Batch size (eval) | 32 | | |
| | Learning rate | 3e-5 | | |
| | Warmup steps | 100 | | |
| | Weight decay | 0.01 | | |
| | Max sequence length | 512 | | |
| | Train samples | 6,400 | | |
| | Test samples | 1,600 | | |
| | Platform | Kaggle (GPU T4 x2) | | |
| | Tracking | Weights & Biases | | |
| ## Results | |
| | Metric | Score | | |
| |--------|-------| | |
| | Accuracy | 0.5831 | | |
| | F1 Score (weighted) | 0.5810 | | |
| | Eval Loss | 2.2847 | | |
| ### Per-Epoch Results | |
| | Epoch | Training Loss | Validation Loss | Accuracy | F1 | | |
| |-------|--------------|-----------------|----------|-----| | |
| | 1 | 2.5710 | 2.5337 | 0.5525 | 0.5454 | | |
| | 2 | 2.1273 | 2.2859 | 0.5981 | 0.5983 | | |
| | 3 | 1.6126 | 2.2923 | 0.6094 | 0.6089 | | |
| ## How to Use | |
| ```python | |
| from transformers import pipeline | |
| classifier = pipeline( | |
| "text-classification", | |
| model="sureshbabugandla/ML_OPS_ASSIGNMENT2" | |
| ) | |
| result = classifier("This book was a thrilling mystery with unexpected twists.") | |
| print(result) | |
| ``` | |
| Or load the model and tokenizer separately: | |
| ```python | |
| from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification | |
| tokenizer = DistilBertTokenizerFast.from_pretrained("sureshbabugandla/ML_OPS_ASSIGNMENT2") | |
| model = DistilBertForSequenceClassification.from_pretrained("sureshbabugandla/ML_OPS_ASSIGNMENT2") | |
| ``` | |
| ## Dataset | |
| The model was trained on the [UCSD Book Graph](https://mengtingwan.github.io/data/goodreads.html) dataset, which contains book reviews from Goodreads across multiple genres. 2,000 reviews were sampled from each of the 8 genres, split into 800 train and 200 test samples per genre. | |
| ## Developed By | |
| - **Name:** Suresh Babu Gandla | |
| - **Roll Number:** G25AIT2119 | |
| ## Links | |
| - **GitHub:** https://github.com/g25ait2119/MLOpsAssignment2 | |
| - **W&B Dashboard:** https://wandb.ai/g25ait2119-sureshbabu-gandla/mlops-assignment2 | |
| - **Kaggle Notebook:** https://www.kaggle.com/code/sureshbabugandla/mlops-a2-training | |