tjkeay
/

Distilbert_Steam_Sentiment_Small

@@ -1,20 +1,31 @@
 ---
 library_name: transformers
-tags: []
 ---
-# Model Card for Model ID
 This is a fine-tuned version of the distilbert/distilbert-base-uncased model trained on the SebastianHops/steam-reviews-english dataset.
 It was made for the purpose of simple sentiment analysis, particularly of video game reviews.
-## Model Details
 ### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
 - **Developed by:** Trevor Keay
 - **Model type:** Custom-tuned Transformer
@@ -22,40 +33,31 @@ This is the model card of a 🤗 transformers model that has been pushed on the
 - **License:** Apache License 2.0
 - **Finetuned from model [optional]:** distilbert/distilbert-base-uncased
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
 - **Base Model** https://huggingface.co/distilbert/distilbert-base-uncased
 - **Training Data** https://huggingface.co/datasets/SebastianHops/steam-reviews-english
 ## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 ### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
 ### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
 ## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
 ### Recommendations
@@ -65,131 +67,43 @@ Users (both direct and downstream) should be made aware of the risks, biases and
 ## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
 ## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
 ## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
 ### Testing Data, Factors & Metrics
 #### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 ---
 library_name: transformers
+tags:
+- steam
+- video games
+- distilbert
+license: apache-2.0
+datasets:
+- SebastianHops/steam-reviews-english
+language:
+- en
+base_model:
+- distilbert/distilbert-base-uncased
+pipeline_tag: text-classification
 ---
+# Distilbert Steam Sentiment (Small)
 This is a fine-tuned version of the distilbert/distilbert-base-uncased model trained on the SebastianHops/steam-reviews-english dataset.
 It was made for the purpose of simple sentiment analysis, particularly of video game reviews.
 ### Model Description
+This model uses Distilbert as a base and then uses a subset of the SebastianHops/steam-reviews-english dataset for training. I call this
+model the "small" version because it utilizes only a fraction (100000 lines) of the training dataset for training/running speed purposes.
+Given the dataset and base model, Distilbert Steam Sentiment (Small) is great for sentiment analysis applications, especially within the
+video games & new media industries. The training data includes lots of gen alpha/z internet culture-related slang which makes it unique
+compared to other sentiment analysis models.
 - **Developed by:** Trevor Keay
 - **Model type:** Custom-tuned Transformer
 - **License:** Apache License 2.0
 - **Finetuned from model [optional]:** distilbert/distilbert-base-uncased
+### Model Sources
 - **Base Model** https://huggingface.co/distilbert/distilbert-base-uncased
 - **Training Data** https://huggingface.co/datasets/SebastianHops/steam-reviews-english
 ## Uses
+While Distilbert is useful for a variety of sentence prediction and analysis appliications, sentiment analysis is the primary purpose for
+this downstream version.
 ### Direct Use
+Primarily sentiment analysis applications involving new media / videogames industry
 ### Out-of-Scope Use
+This model may not work as well when used to analyze traditional literature or more formal text as the training data is comprised of
+extremely informal text that is littered with modern slang. I do not endorse or condone the use of this model for any malicious or
+illegal purposes, and I do not believe it would work well for those applications anyways!
 ## Bias, Risks, and Limitations
+This model reflects the biases present witin both the base model and training data. It is biased towards more extreme reactions as due to response bias, users
+that voluntarily review games are more likely to have extreme opinions compared to the average user of a game. Additionally, due to cultural trends within the
+gaming community, racial and/or gender biases are likely present in the output.
 ### Recommendations
 ## How to Get Started with the Model
+Here is a really simple application of the model to get you going:
+```
+from transformers import pipeline
+MODEL_NAME = "tjkeay/Distilbert_Steam_Sentiment_Small"
+sentiment_classifier = pipeline(
+    task = "text-classification",
+    model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME),
+    tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME),
+    device = 0 if torch.cuda.is_available() else -1
+)
+example_text = "10/10 could not stop dying"
+result = sentiment_classifier(example_text)[0]
+output = result["label"]
+print("output (0 should be negative):", output)
+```
 ## Training Details
+The model was trained with custom arguments focused around being lightweight and efficient.
+### Training Data
+The training data contains multitudes of reviews scraped directly from steam. Only the 'game', 'review', 'voted_up', 'author_playtime_forever', and
+'author_playtime_at_review' columns were included for training. Additionally, the model was only trained on a random sample of 100,000 entries from the dataset
+to make the model faster to train and to use.
 ## Evaluation
+Evaluation Results: {'eval_train_loss': 0.14118799567222595, 'eval_test_loss': 0.1386687308549881}
 ### Testing Data, Factors & Metrics
 #### Testing Data
+A train-test split of the same steam reviews dataset was used.
+## Model Card Authors
+Trevor Keay