Instructions to use sumitp76/distilroberta-base-goodreads-genres with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use sumitp76/distilroberta-base-goodreads-genres with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="sumitp76/distilroberta-base-goodreads-genres")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("sumitp76/distilroberta-base-goodreads-genres") model = AutoModelForSequenceClassification.from_pretrained("sumitp76/distilroberta-base-goodreads-genres") - Notebooks
- Google Colab
- Kaggle
distilroberta-base-goodreads-genres
Model Overview
This model is a fine-tuned version of distilroberta-base for Goodreads book review / genre classification. It is designed to classify book-related text, such as book reviews, descriptions, or summaries, into genre categories.
The model was developed as part of an MLOps assignment using Hugging Face Transformers, Kaggle Notebook, Hugging Face Hub, and Weights & Biases for experiment tracking.
Model Details
- Model name:
distilroberta-base-goodreads-genres - Base model:
distilroberta-base - Model type: Transformer-based sequence classification model
- Task: Text classification
- Domain: Goodreads book reviews / genre classification
- Language: English
- Library: Hugging Face Transformers
- Training platform: Kaggle Notebook
- Experiment tracking: Weights & Biases
- Model repository: https://huggingface.co/sumitp76/distilroberta-base-goodreads-genres
Important Links
- Kaggle Notebook: https://www.kaggle.com/code/sumitpiitj/gr-book-review/edit
- Hugging Face Model: https://huggingface.co/sumitp76/distilroberta-base-goodreads-genres
- W&B Project Dashboard: https://wandb.ai/sumit-k-pal-76-iitj/mlops-assignment2/table?nw=nwusersumitkpal76
Setup Instructions
1. Clone or open the project
The training was performed in a Kaggle Notebook.
Kaggle Notebook:
https://www.kaggle.com/code/sumitpiitj/gr-book-review/edit
2. Install dependencies
Install the required Python libraries:
pip install transformers
pip install datasets
pip install evaluate
pip install accelerate
pip install huggingface_hub
pip install wandb
pip install scikit-learn
pip install pandas
pip install numpy
pip install torch
In Kaggle, many packages may already be installed. If needed, install missing packages inside a notebook cell:
!pip install transformers datasets evaluate accelerate huggingface_hub wandb scikit-learn
3. Set up Huggging Face token
To push the model to Hugging Face Hub, create a Hugging Face access token with Write permission.
In Kaggle:
Go to Add-ons Open Secrets Add your Hugging Face token Save it using the name:
HF_TOKEN
Then load it in the notebook:
from kaggle_secrets import UserSecretsClient
from huggingface_hub import login
user_secrets = UserSecretsClient()
HF_TOKEN = user_secrets.get_secret("HF_TOKEN")
login(token=HF_TOKEN)
4. Set up W&B tracking
Log in to Weights & Biases:
import wandb
wandb.login()
Initialize a W&B run:
wandb.init(
project="mlops-assignment2",
name="distilroberta-base-goodreads-genres"
)
W&B dashboard:
https://wandb.ai/sumit-k-pal-76-iitj/mlops-assignment2/table?nw=nwusersumitkpal76
5. Train the model
The model was trained using Hugging Face Trainer.
General training flow:
from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments
model_name = "distilroberta-base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(
model_name,
num_labels=num_labels
)
Tokenize the dataset:
def tokenize_function(examples):
return tokenizer(
examples["text"],
padding="max_length",
truncation=True
)
tokenized_dataset = dataset.map(tokenize_function, batched=True)
Train using Trainer:
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
tokenizer=tokenizer,
compute_metrics=compute_metrics
)
trainer.train()
6. Evaluate the model
After training, evaluate the model:
results = trainer.evaluate()
print(results)
7. Push the model to Hugging Face Hub
repo_id = "sumitp76/distilroberta-base-goodreads-genres"
model.push_to_hub(repo_id, token=HF_TOKEN)
tokenizer.push_to_hub(repo_id, token=HF_TOKEN)
Model link:
https://huggingface.co/sumitp76/distilroberta-base-goodreads-genres
How to use the model
You can use the model directly with the Hugging Face pipeline.
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="sumitp76/distilroberta-base-goodreads-genres"
)
text = "A young wizard discovers his magical powers and enters a hidden world of adventure."
result = classifier(text)
print(result)
Training Details
Training Platform
The model was trained on Kaggle Notebook.
Platform: Kaggle Notebook link: https://www.kaggle.com/code/sumitpiitj/gr-book-review/edit Framework: Hugging Face Transformers Experiment tracking: Weights & Biases Model hosting: Hugging Face Hub
Base Model
The base model used was:
distilroberta-base
distilroberta-base is a smaller and faster version of RoBERTa. It is suitable for text classification tasks where a balance between performance and efficiency is required.
Preprocessing Steps
The general preprocessing steps included:
- Loading the Goodreads book review / genre dataset
- Checking and cleaning missing values
- Preparing the input text column
- Encoding genre labels into numeric IDs
- Splitting the dataset into training and evaluation sets
- Tokenizing text using the distilroberta-base tokenizer
- Applying truncation and padding
- Training the model using Hugging Face Trainer
Training Configuration
Update the values below according to the final notebook settings:
| Parameter | Value |
|---|---|
| Base model | distilroberta-base |
| Task | Text Classification |
| Optimizer | AdamW |
| Loss function | Cross-entropy loss |
| Training/Eval Batch size | 16/32 |
| Learning rate | 2e-5 |
| Number of epochs | 6 |
| Max sequence length | 256 |
| Evaluation strategy | steps |
Results
| Metric | Score |
|---|---|
| Accuracy | 0.61583 |
| F1 Score | 0.61632 |
| Eval Loss | 2.66787 |
Result Link
Kaggle Notebook: https://www.kaggle.com/code/sumitpiitj/gr-book-review/edit Hugging Face model: https://huggingface.co/sumitp76/distilroberta-base-goodreads-genres W&B dashboard: https://wandb.ai/sumit-k-pal-76-iitj/mlops-assignment2/table?nw=nwusersumitkpal76
- Downloads last month
- 79
Model tree for sumitp76/distilroberta-base-goodreads-genres
Base model
distilbert/distilroberta-baseEvaluation results
- Accuracy on Goodreads Book Review / Genre Classification Datasetself-reported0.616
- F1 Score on Goodreads Book Review / Genre Classification Datasetself-reported0.616
- Eval Loss on Goodreads Book Review / Genre Classification Datasetself-reported2.668