--- language: - en license: mit library_name: transformers tags: - text-classification - distilbert - book-genre-classification - mlops datasets: - custom metrics: - accuracy - f1 pipeline_tag: text-classification model-index: - name: ML_OPS_ASSIGNMENT2 results: - task: type: text-classification name: Text Classification metrics: - name: Accuracy type: accuracy value: 0.5831 - name: F1 (weighted) type: f1 value: 0.5810 --- # DistilBERT Book Genre Classifier A fine-tuned **DistilBERT** model for classifying book reviews into 8 genres. ## Model Description This model is based on `distilbert-base-cased` and was fine-tuned on the UCSD Goodreads book reviews dataset. It classifies a given book review text into one of 8 genres. - **Model:** distilbert-base-cased - **Task:** Multi-class text classification (8 genres) - **Language:** English - **License:** MIT ## Supported Genres | Label | Genre | |-------|-------| | 0 | Children | | 1 | Comics & Graphic | | 2 | Fantasy & Paranormal | | 3 | History & Biography | | 4 | Mystery, Thriller & Crime | | 5 | Poetry | | 6 | Romance | | 7 | Young Adult | ## Training Details | Parameter | Value | |-----------|-------| | Base model | distilbert-base-cased | | Epochs | 3 | | Batch size (train) | 16 | | Batch size (eval) | 32 | | Learning rate | 3e-5 | | Warmup steps | 100 | | Weight decay | 0.01 | | Max sequence length | 512 | | Train samples | 6,400 | | Test samples | 1,600 | | Platform | Kaggle (GPU T4 x2) | | Tracking | Weights & Biases | ## Results | Metric | Score | |--------|-------| | Accuracy | 0.5831 | | F1 Score (weighted) | 0.5810 | | Eval Loss | 2.2847 | ### Per-Epoch Results | Epoch | Training Loss | Validation Loss | Accuracy | F1 | |-------|--------------|-----------------|----------|-----| | 1 | 2.5710 | 2.5337 | 0.5525 | 0.5454 | | 2 | 2.1273 | 2.2859 | 0.5981 | 0.5983 | | 3 | 1.6126 | 2.2923 | 0.6094 | 0.6089 | ## How to Use ```python from transformers import pipeline classifier = pipeline( "text-classification", model="sureshbabugandla/ML_OPS_ASSIGNMENT2" ) result = classifier("This book was a thrilling mystery with unexpected twists.") print(result) ``` Or load the model and tokenizer separately: ```python from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification tokenizer = DistilBertTokenizerFast.from_pretrained("sureshbabugandla/ML_OPS_ASSIGNMENT2") model = DistilBertForSequenceClassification.from_pretrained("sureshbabugandla/ML_OPS_ASSIGNMENT2") ``` ## Dataset The model was trained on the [UCSD Book Graph](https://mengtingwan.github.io/data/goodreads.html) dataset, which contains book reviews from Goodreads across multiple genres. 2,000 reviews were sampled from each of the 8 genres, split into 800 train and 200 test samples per genre. ## Developed By - **Name:** Suresh Babu Gandla - **Roll Number:** G25AIT2119 ## Links - **GitHub:** https://github.com/g25ait2119/MLOpsAssignment2 - **W&B Dashboard:** https://wandb.ai/g25ait2119-sureshbabu-gandla/mlops-assignment2 - **Kaggle Notebook:** https://www.kaggle.com/code/sureshbabugandla/mlops-a2-training