Instructions to use tjkeay/Distilbert_Steam_Sentiment_Small with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use tjkeay/Distilbert_Steam_Sentiment_Small with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="tjkeay/Distilbert_Steam_Sentiment_Small")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("tjkeay/Distilbert_Steam_Sentiment_Small") model = AutoModelForSequenceClassification.from_pretrained("tjkeay/Distilbert_Steam_Sentiment_Small") - Notebooks
- Google Colab
- Kaggle
# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("tjkeay/Distilbert_Steam_Sentiment_Small")
model = AutoModelForSequenceClassification.from_pretrained("tjkeay/Distilbert_Steam_Sentiment_Small")Distilbert Steam Sentiment (Small)
This is a fine-tuned version of the distilbert/distilbert-base-uncased model trained on the SebastianHops/steam-reviews-english dataset. It was made for the purpose of simple sentiment analysis, particularly of video game reviews.
Model Description
This model uses Distilbert as a base and then uses a subset of the SebastianHops/steam-reviews-english dataset for training. I call this model the "small" version because it utilizes only a fraction (100000 lines) of the training dataset for training/running speed purposes. Given the dataset and base model, Distilbert Steam Sentiment (Small) is great for sentiment analysis applications, especially within the video games & new media industries. The training data includes lots of gen alpha/z internet culture-related slang which makes it unique compared to other sentiment analysis models.
- Developed by: Trevor Keay
- Model type: Custom-tuned Transformer
- Language(s) (NLP): Python
- License: Apache License 2.0
- Finetuned from model [optional]: distilbert/distilbert-base-uncased
Model Sources
- Base Model https://huggingface.co/distilbert/distilbert-base-uncased
- Training Data https://huggingface.co/datasets/SebastianHops/steam-reviews-english
Uses
While Distilbert is useful for a variety of sentence prediction and analysis appliications, sentiment analysis is the primary purpose for this downstream version.
Direct Use
Primarily sentiment analysis applications involving new media / videogames industry
Out-of-Scope Use
This model may not work as well when used to analyze traditional literature or more formal text as the training data is comprised of extremely informal text that is littered with modern slang. I do not endorse or condone the use of this model for any malicious or illegal purposes, and I do not believe it would work well for those applications anyways!
Bias, Risks, and Limitations
This model reflects the biases present witin both the base model and training data. It is biased towards more extreme reactions as due to response bias, users that voluntarily review games are more likely to have extreme opinions compared to the average user of a game. Additionally, due to cultural trends within the gaming community, racial and/or gender biases are likely present in the output.
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
How to Get Started with the Model
Here is a really simple application of the model to get you going:
from transformers import pipeline
MODEL_NAME = "tjkeay/Distilbert_Steam_Sentiment_Small"
sentiment_classifier = pipeline(
task = "text-classification",
model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME),
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME),
device = 0 if torch.cuda.is_available() else -1
)
example_text = "10/10 could not stop dying"
result = sentiment_classifier(example_text)[0]
output = result["label"]
print("output (0 should be negative):", output)
Training Details
The model was trained with custom arguments focused around being lightweight and efficient.
Training Data
The training data contains multitudes of reviews scraped directly from steam. Only the 'game', 'review', 'voted_up', 'author_playtime_forever', and 'author_playtime_at_review' columns were included for training. Additionally, the model was only trained on a random sample of 100,000 entries from the dataset to make the model faster to train and to use.
Evaluation
Evaluation Results: {'eval_train_loss': 0.14118799567222595, 'eval_test_loss': 0.1386687308549881}
Testing Data
A train-test split of the same steam reviews dataset was used.
Model Card Authors
Trevor Keay
- Downloads last month
- 152
Model tree for tjkeay/Distilbert_Steam_Sentiment_Small
Base model
distilbert/distilbert-base-uncased
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="tjkeay/Distilbert_Steam_Sentiment_Small")