Text Classification
Transformers
Safetensors
English
distilbert
steam
video games
text-embeddings-inference
Instructions to use tjkeay/Distilbert_Steam_Sentiment_Small with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use tjkeay/Distilbert_Steam_Sentiment_Small with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="tjkeay/Distilbert_Steam_Sentiment_Small")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("tjkeay/Distilbert_Steam_Sentiment_Small") model = AutoModelForSequenceClassification.from_pretrained("tjkeay/Distilbert_Steam_Sentiment_Small") - Notebooks
- Google Colab
- Kaggle
| library_name: transformers | |
| tags: | |
| - steam | |
| - video games | |
| - distilbert | |
| license: apache-2.0 | |
| datasets: | |
| - SebastianHops/steam-reviews-english | |
| language: | |
| - en | |
| base_model: | |
| - distilbert/distilbert-base-uncased | |
| pipeline_tag: text-classification | |
| # Distilbert Steam Sentiment (Small) | |
| This is a fine-tuned version of the distilbert/distilbert-base-uncased model trained on the SebastianHops/steam-reviews-english dataset. | |
| It was made for the purpose of simple sentiment analysis, particularly of video game reviews. | |
| ### Model Description | |
| This model uses Distilbert as a base and then uses a subset of the SebastianHops/steam-reviews-english dataset for training. I call this | |
| model the "small" version because it utilizes only a fraction (100000 lines) of the training dataset for training/running speed purposes. | |
| Given the dataset and base model, Distilbert Steam Sentiment (Small) is great for sentiment analysis applications, especially within the | |
| video games & new media industries. The training data includes lots of gen alpha/z internet culture-related slang which makes it unique | |
| compared to other sentiment analysis models. | |
| - **Developed by:** Trevor Keay | |
| - **Model type:** Custom-tuned Transformer | |
| - **Language(s) (NLP):** Python | |
| - **License:** Apache License 2.0 | |
| - **Finetuned from model [optional]:** distilbert/distilbert-base-uncased | |
| ### Model Sources | |
| - **Base Model** https://huggingface.co/distilbert/distilbert-base-uncased | |
| - **Training Data** https://huggingface.co/datasets/SebastianHops/steam-reviews-english | |
| ## Uses | |
| While Distilbert is useful for a variety of sentence prediction and analysis appliications, sentiment analysis is the primary purpose for | |
| this downstream version. | |
| ### Direct Use | |
| Primarily sentiment analysis applications involving new media / videogames industry | |
| ### Out-of-Scope Use | |
| This model may not work as well when used to analyze traditional literature or more formal text as the training data is comprised of | |
| extremely informal text that is littered with modern slang. I do not endorse or condone the use of this model for any malicious or | |
| illegal purposes, and I do not believe it would work well for those applications anyways! | |
| ## Bias, Risks, and Limitations | |
| This model reflects the biases present witin both the base model and training data. It is biased towards more extreme reactions as due to response bias, users | |
| that voluntarily review games are more likely to have extreme opinions compared to the average user of a game. Additionally, due to cultural trends within the | |
| gaming community, racial and/or gender biases are likely present in the output. | |
| ### Recommendations | |
| <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. --> | |
| Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. | |
| ## How to Get Started with the Model | |
| Here is a really simple application of the model to get you going: | |
| ``` | |
| from transformers import pipeline | |
| MODEL_NAME = "tjkeay/Distilbert_Steam_Sentiment_Small" | |
| sentiment_classifier = pipeline( | |
| task = "text-classification", | |
| model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME), | |
| tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME), | |
| device = 0 if torch.cuda.is_available() else -1 | |
| ) | |
| example_text = "10/10 could not stop dying" | |
| result = sentiment_classifier(example_text)[0] | |
| output = result["label"] | |
| print("output (0 should be negative):", output) | |
| ``` | |
| ## Training Details | |
| The model was trained with custom arguments focused around being lightweight and efficient. | |
| ### Training Data | |
| The training data contains multitudes of reviews scraped directly from steam. Only the 'game', 'review', 'voted_up', 'author_playtime_forever', and | |
| 'author_playtime_at_review' columns were included for training. Additionally, the model was only trained on a random sample of 100,000 entries from the dataset | |
| to make the model faster to train and to use. | |
| ## Evaluation | |
| Evaluation Results: {'eval_train_loss': 0.14118799567222595, 'eval_test_loss': 0.1386687308549881} | |
| #### Testing Data | |
| A train-test split of the same steam reviews dataset was used. | |
| ## Model Card Authors | |
| Trevor Keay |