You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Model Card Metadata: library_name: transformers tags:

text-classification
distilbert datasets:
ucsd_goodreads metrics:
accuracy
f1

Model Card for Model ID

Model Details

Model Description

This model is a fine-tuned version of distilbert-base-cased designed to classify book reviews into seven distinct genres using the UCSD Goodreads reviews dataset. It was developed as part of an MLOps assignment to demonstrate experiment tracking and model deployment.

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

Developed by: Duggirala Vnaga Ananth (G25AIT2032)
Funded by [optional]: IIT Jodhpur | PGD AI Programme
Shared by [optional]: Duggirala Vnaga Ananth (G25AIT2032)
Model type: Transformer-based Text Classification
Language(s) (NLP): English
License: Apache 2.0 (may be no license)
Finetuned from model [optional]: distilbert-base-cased

Model Sources [optional]

Repository: nagaananth/distilbert-goodreads-genres
W&B Dashboard: mlops-assignment2

Uses

Direct Use

This model is intended for classifying short-to-medium length book reviews into one of seven genres: Poetry, Comics & Graphic, Fantasy & Paranormal, History & Biography, Mystery/Thriller/Crime, Romance, and Young Adult.

Out-of-Scope Use

The model should not be used for high-stakes decision-making or for analyzing text outside the literary review domain.

How to Get Started with the Model

from transformers import pipeline

classifier = pipeline("text-classification", model="nagaananth/distilbert-goodreads-genres")
result = classifier("The imagery in these stanzas was breathtaking.")
print(result)

## Training Details

### Training Data

The model was trained on a sampled version of the UCSD Goodreads Book Graph dataset, balanced across seven genres.

### Training Procedure

The pipeline was modularized into data.py, train.py, and eval.py scripts.
Training was conducted using the Hugging Face Trainer API with experiment tracking via Weights & Biases.

#### Preprocessing 

Tokenization: Reviews were tokenized using the DistilBertTokenizerFast with a maximum sequence length of 512 tokens.

Sampling: To maintain a balanced dataset and manage compute time, the dataset was sampled to a specific number of reviews per genre (e.g., 200 or 400).

Splitting: Data was split into training and evaluation sets using a standard 80/20 or similar stratified split to ensure genre representation.


#### Training Hyperparameters

- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision,
bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->

#### Speeds, Sizes, Times [optional]

Checkpoint Size: Approximately 260 MB (standard for DistilBERT).

Evaluation Runtime: 19.6347 seconds

Throughput: 71.302 samples per second

Checkpoint Size: ~260 MB (Standard for DistilBERT base)

## Evaluation
### Testing Data, Factors & Metrics
#### Testing Data

Evaluation was performed on a held-out test set of 1,400 samples (200 per genre).

### Factors
The evaluation of this model is disaggregated by the following factors to ensure a granular understanding of performance:

Genre Labels: Performance is analyzed individually for each of the 7 genres (Poetry, Comics & Graphic, Fantasy & Paranormal, History & Biography, Mystery/Thriller/Crime, Romance, and Young Adult) to identify class-specific biases or difficulties.

Text Domain: The evaluation is focused specifically on the domain of book reviews from the UCSD Goodreads dataset.

Class Imbalance: Factors include the use of a balanced test set (200 samples per genre) to ensure that the macro and weighted averages provide a fair representation of model quality across all categories.

### Metrics
The following metrics were chosen to evaluate the model's performance, as they are standard for multi-class classification tasks in an MLOps pipeline:

Accuracy: Used to measure the overall percentage of correct predictions across all 1,400 test samples.

Weighted F1-Score: This is the primary metric for this project, as it balances precision and recall while accounting for the support (number of instances) of each label.

Precision and Recall: These are logged per-class to distinguish between the model's ability to avoid false positives (precision) and its ability to find all actual members of a genre (recall).

Eval Loss: Cross-entropy loss is monitored to evaluate how well the model's predicted probabilities align with the ground truth labels.


### Results
https://wandb.ai/g25ait2032-iit-jodhpur/mlops-assignment2/runs/mzhecp3m/logs?nw=nwuserg25ait2032
Metric,Score
Eval Loss,0.9723
Accuracy,0.6464
Weighted F1,0.6452
Macro F1,0.6500

### Per-Class Performance
Genre,F1           Score
Poetry,            0.81
Comics & Graphic,  0.85
Young Adult,       0.38

#### Summary
This model provides a robust baseline for book genre classification using transformer-based NLP. 
While it performs exceptionally well on genres with distinct vocabularies, such as Comics & Graphic ($F1=0.85$) and Poetry ($F1=0.81$), 
it faces challenges with overlapping themes in Young Adult ($F1=0.38$). 
Overall, the model achieves a consistent Accuracy of 0.65 across seven categories, 
demonstrating the effectiveness of the MLOps pipeline for rapid experimentation and deployment.


## Model Examination 
To ensure the model's reliability and performance, the following interpretability and monitoring steps were taken:

Experiment Tracking: Weights & Biases was used to track training/validation loss and accuracy in real-time, ensuring the model converged without significant overfitting.

Metric Analysis: Detailed per-class metrics (Precision, Recall, and F1-Score) were logged to identify which genres were most challenging for the model.

Error Analysis: Evaluation logs (as seen in eval_report.json) were examined to understand class confusion, particularly between linguistically similar genres like "Young Adult" and "Fantasy & Paranormal".

Artifact Logging: The final model and its associated classification report were saved as W&B Artifacts to ensure version control and reproducibility of results.

## Environmental Impact
Carbon Emitted: < 0.01 kg CO2eq (Extremely low due to the small dataset and short runtime).
(https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

### Technical Specifications 
Compute Infrastructure

Hardware: Google Colab GPU (Standard) NVIDIA T4 Tensor Core GPU.
#### Hardware
GPU: NVIDIA T4 Tensor Core GPU (provided via Google Colab Free Tier).
VRAM: 16GB GDDR6.
System RAM: 12GB.

#### Software
Transformers: Hugging Face library for model loading and training.

PyTorch: Deep learning framework.

Weights & Biases: Experiment tracking and artifact logging.

Scikit-learn: Used for generating the classification report and metrics.

Hours used: 0.1 hours. 
(Evaluation run took about 20 seconds; assuming training run was roughly 5–6 minutes, 0.1 hours is an estimate for the full session).

Cloud Provider: Google Cloud Platform (GCP). 

Compute Region: us-central1 (Standard default).


### Model Architecture and Objective
The model utilizes the DistilBERT architecture, which is a distilled, smaller, faster, and cheaper version of BERT. 
It consists of 6 transformer layers (compared to 12 in BERT-base). 
The objective is Multi-class Text Classification: the model takes a sequence of text (book review) and 
predicts one of seven genre labels by adding a sequence classification head on top of the pooled output.

## Citation 

**BibTeX:**
@article{Sanh2019DistilBERTAD,
  title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter},
  author={Victor Sanh and Lysandre Debut and Julien Chaumond and Thomas Wolf},
  journal={ArXiv},
  year={2019},
  volume={abs/1910.01108}
}

**APA:**
Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.

### Glossary 
Precision: The ratio of correctly predicted positive observations to the total predicted positives.

Recall: The ratio of correctly predicted positive observations to all observations in the actual class.

F1 Score: The weighted average of Precision and Recall.

Support: The number of actual occurrences of the class in the specified dataset.


### Run summary:
wandb:               eval/accuracy 0.64643
wandb:                     eval/f1 0.64524
wandb:                   eval/loss 0.97231
wandb: eval/model_preparation_time 0.0026
wandb:                eval/runtime 19.6347
wandb:     eval/samples_per_second 71.302
wandb:       eval/steps_per_second 2.241
wandb:              final/accuracy 0.64643
wandb:                    final/f1 0.64524
wandb:                  final/loss 0.97231


### Model Card Authors 
Duggirala Vnaga Ananth (G25AIT2032)
Student at IIT Jodhpur, PGD AI Programme.

### Model Card Contact
Email: g25ait2032@iitj.ac.in
Hugging Face: [Community tab on this model page](https://huggingface.co/nagaananth/distilbert-goodreads-genres)
GitHub: Open an issue on the project repository: https://github.com/g25ait2032-prog/iitjodhpur/

Downloads last month: -

Safetensors

Model size

65.8M params

Tensor type

F32

Papers for nagaananth/distilbert-goodreads-genres

Quantifying the Carbon Emissions of Machine Learning

Paper • 1910.09700 • Published Oct 21, 2019 • 60

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 23