Update README.md

30ed6ca verified 2 days ago

4.27 kB

	---
	library_name: transformers
	tags:
	- steam
	- video games
	- distilbert
	license: apache-2.0
	datasets:
	- SebastianHops/steam-reviews-english
	language:
	- en
	base_model:
	- distilbert/distilbert-base-uncased
	pipeline_tag: text-classification
	---

	# Distilbert Steam Sentiment (Small)

	This is a fine-tuned version of the distilbert/distilbert-base-uncased model trained on the SebastianHops/steam-reviews-english dataset.
	It was made for the purpose of simple sentiment analysis, particularly of video game reviews.

	### Model Description

	This model uses Distilbert as a base and then uses a subset of the SebastianHops/steam-reviews-english dataset for training. I call this
	model the "small" version because it utilizes only a fraction (100000 lines) of the training dataset for training/running speed purposes.
	Given the dataset and base model, Distilbert Steam Sentiment (Small) is great for sentiment analysis applications, especially within the
	video games & new media industries. The training data includes lots of gen alpha/z internet culture-related slang which makes it unique
	compared to other sentiment analysis models.

	- Developed by: Trevor Keay
	- Model type: Custom-tuned Transformer
	- Language(s) (NLP): Python
	- License: Apache License 2.0
	- Finetuned from model [optional]: distilbert/distilbert-base-uncased

	### Model Sources

	- Base Model https://huggingface.co/distilbert/distilbert-base-uncased
	- Training Data https://huggingface.co/datasets/SebastianHops/steam-reviews-english

	## Uses

	While Distilbert is useful for a variety of sentence prediction and analysis appliications, sentiment analysis is the primary purpose for
	this downstream version.

	### Direct Use

	Primarily sentiment analysis applications involving new media / videogames industry

	### Out-of-Scope Use

	This model may not work as well when used to analyze traditional literature or more formal text as the training data is comprised of
	extremely informal text that is littered with modern slang. I do not endorse or condone the use of this model for any malicious or
	illegal purposes, and I do not believe it would work well for those applications anyways!

	## Bias, Risks, and Limitations

	This model reflects the biases present witin both the base model and training data. It is biased towards more extreme reactions as due to response bias, users
	that voluntarily review games are more likely to have extreme opinions compared to the average user of a game. Additionally, due to cultural trends within the
	gaming community, racial and/or gender biases are likely present in the output.

	### Recommendations

	<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

	Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

	## How to Get Started with the Model

	Here is a really simple application of the model to get you going:
	```
	from transformers import pipeline

	MODEL_NAME = "tjkeay/Distilbert_Steam_Sentiment_Small"

	sentiment_classifier = pipeline(
	task = "text-classification",
	model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME),
	tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME),
	device = 0 if torch.cuda.is_available() else -1
	)
	example_text = "10/10 could not stop dying"
	result = sentiment_classifier(example_text)[0]
	output = result["label"]
	print("output (0 should be negative):", output)
	```
	## Training Details

	The model was trained with custom arguments focused around being lightweight and efficient.

	### Training Data

	The training data contains multitudes of reviews scraped directly from steam. Only the 'game', 'review', 'voted_up', 'author_playtime_forever', and
	'author_playtime_at_review' columns were included for training. Additionally, the model was only trained on a random sample of 100,000 entries from the dataset
	to make the model faster to train and to use.

	## Evaluation

	Evaluation Results: {'eval_train_loss': 0.14118799567222595, 'eval_test_loss': 0.1386687308549881}

	#### Testing Data

	A train-test split of the same steam reviews dataset was used.

	## Model Card Authors

	Trevor Keay