Distilbert Steam Sentiment (Small)

This is a fine-tuned version of the distilbert/distilbert-base-uncased model trained on the SebastianHops/steam-reviews-english dataset. It was made for the purpose of simple sentiment analysis, particularly of video game reviews.

Model Description

This model uses Distilbert as a base and then uses a subset of the SebastianHops/steam-reviews-english dataset for training. I call this model the "small" version because it utilizes only a fraction (100000 lines) of the training dataset for training/running speed purposes. Given the dataset and base model, Distilbert Steam Sentiment (Small) is great for sentiment analysis applications, especially within the video games & new media industries. The training data includes lots of gen alpha/z internet culture-related slang which makes it unique compared to other sentiment analysis models.

Developed by: Trevor Keay
Model type: Custom-tuned Transformer
Language(s) (NLP): Python
License: Apache License 2.0
Finetuned from model [optional]: distilbert/distilbert-base-uncased

Model Sources

Base Model https://huggingface.co/distilbert/distilbert-base-uncased
Training Data https://huggingface.co/datasets/SebastianHops/steam-reviews-english

Uses

While Distilbert is useful for a variety of sentence prediction and analysis appliications, sentiment analysis is the primary purpose for this downstream version.

Direct Use

Primarily sentiment analysis applications involving new media / videogames industry

Out-of-Scope Use

This model may not work as well when used to analyze traditional literature or more formal text as the training data is comprised of extremely informal text that is littered with modern slang. I do not endorse or condone the use of this model for any malicious or illegal purposes, and I do not believe it would work well for those applications anyways!

Bias, Risks, and Limitations

This model reflects the biases present witin both the base model and training data. It is biased towards more extreme reactions as due to response bias, users that voluntarily review games are more likely to have extreme opinions compared to the average user of a game. Additionally, due to cultural trends within the gaming community, racial and/or gender biases are likely present in the output.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Here is a really simple application of the model to get you going:

from transformers import pipeline

MODEL_NAME = "tjkeay/Distilbert_Steam_Sentiment_Small"

sentiment_classifier = pipeline(
    task = "text-classification",
    model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME),
    tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME),
    device = 0 if torch.cuda.is_available() else -1
)
example_text = "10/10 could not stop dying"
result = sentiment_classifier(example_text)[0]
output = result["label"]
print("output (0 should be negative):", output)

Training Details

The model was trained with custom arguments focused around being lightweight and efficient.

Training Data

The training data contains multitudes of reviews scraped directly from steam. Only the 'game', 'review', 'voted_up', 'author_playtime_forever', and 'author_playtime_at_review' columns were included for training. Additionally, the model was only trained on a random sample of 100,000 entries from the dataset to make the model faster to train and to use.

Evaluation

Evaluation Results: {'eval_train_loss': 0.14118799567222595, 'eval_test_loss': 0.1386687308549881}

Testing Data

A train-test split of the same steam reviews dataset was used.

Model Card Authors

Trevor Keay

Downloads last month: 152

Safetensors

Model size

67M params

Tensor type

F32

Model tree for tjkeay/Distilbert_Steam_Sentiment_Small

Base model

distilbert/distilbert-base-uncased

Finetuned

(11582)

this model

tjkeay
/

Distilbert_Steam_Sentiment_Small