--- library_name: transformers tags: - steam - video games - distilbert license: apache-2.0 datasets: - SebastianHops/steam-reviews-english language: - en base_model: - distilbert/distilbert-base-uncased pipeline_tag: text-classification --- # Distilbert Steam Sentiment (Small) This is a fine-tuned version of the distilbert/distilbert-base-uncased model trained on the SebastianHops/steam-reviews-english dataset. It was made for the purpose of simple sentiment analysis, particularly of video game reviews. ### Model Description This model uses Distilbert as a base and then uses a subset of the SebastianHops/steam-reviews-english dataset for training. I call this model the "small" version because it utilizes only a fraction (100000 lines) of the training dataset for training/running speed purposes. Given the dataset and base model, Distilbert Steam Sentiment (Small) is great for sentiment analysis applications, especially within the video games & new media industries. The training data includes lots of gen alpha/z internet culture-related slang which makes it unique compared to other sentiment analysis models. - **Developed by:** Trevor Keay - **Model type:** Custom-tuned Transformer - **Language(s) (NLP):** Python - **License:** Apache License 2.0 - **Finetuned from model [optional]:** distilbert/distilbert-base-uncased ### Model Sources - **Base Model** https://huggingface.co/distilbert/distilbert-base-uncased - **Training Data** https://huggingface.co/datasets/SebastianHops/steam-reviews-english ## Uses While Distilbert is useful for a variety of sentence prediction and analysis appliications, sentiment analysis is the primary purpose for this downstream version. ### Direct Use Primarily sentiment analysis applications involving new media / videogames industry ### Out-of-Scope Use This model may not work as well when used to analyze traditional literature or more formal text as the training data is comprised of extremely informal text that is littered with modern slang. I do not endorse or condone the use of this model for any malicious or illegal purposes, and I do not believe it would work well for those applications anyways! ## Bias, Risks, and Limitations This model reflects the biases present witin both the base model and training data. It is biased towards more extreme reactions as due to response bias, users that voluntarily review games are more likely to have extreme opinions compared to the average user of a game. Additionally, due to cultural trends within the gaming community, racial and/or gender biases are likely present in the output. ### Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. ## How to Get Started with the Model Here is a really simple application of the model to get you going: ``` from transformers import pipeline MODEL_NAME = "tjkeay/Distilbert_Steam_Sentiment_Small" sentiment_classifier = pipeline( task = "text-classification", model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME), tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME), device = 0 if torch.cuda.is_available() else -1 ) example_text = "10/10 could not stop dying" result = sentiment_classifier(example_text)[0] output = result["label"] print("output (0 should be negative):", output) ``` ## Training Details The model was trained with custom arguments focused around being lightweight and efficient. ### Training Data The training data contains multitudes of reviews scraped directly from steam. Only the 'game', 'review', 'voted_up', 'author_playtime_forever', and 'author_playtime_at_review' columns were included for training. Additionally, the model was only trained on a random sample of 100,000 entries from the dataset to make the model faster to train and to use. ## Evaluation Evaluation Results: {'eval_train_loss': 0.14118799567222595, 'eval_test_loss': 0.1386687308549881} #### Testing Data A train-test split of the same steam reviews dataset was used. ## Model Card Authors Trevor Keay