File size: 1,245 Bytes
881f6e9
 
 
 
 
 
 
 
 
 
6d2c94f
 
38b8857
6d2c94f
38b8857
6d2c94f
38b8857
6d2c94f
38b8857
6d2c94f
38b8857
6d2c94f
38b8857
 
6d2c94f
38b8857
6d2c94f
38b8857
6d2c94f
38b8857
6d2c94f
38b8857
6d2c94f
38b8857
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
---
license: mit
language:
- en
library_name: transformers
tags:
- text-classification
- yelp-reviews
- gpt-2
- bert
---

# **Model Description**

This model predicts the star rating (1 - 5) of a Yelp review based on its text content. It was trained using **GPT-2** and **BERT**, with **BERT** achieving the best performance at **75%** validation accuracy. The model addresses class imbalance using weighted loss and optimizes hyperparameters to enhance generalization.

# **Training Details**

- **Dataset**: Yelp Reviews dataset (100,000 samples used)

- **Preprocessing**:

  - **GPT-2 Tokenizer** with **Byte-Pair Encoding (BPE)** for rare words
  - Truncation (128 tokens) and padding for uniform input size

- **Models Trained**:

  - **GPT-2**: Fine-tuned with a custom classification head, achieving **67% validation accuracy**

  - **BERT**: Fine-tuned with bidirectional attention, achieving **75% validation accuracy**

- **Loss Function**: Weighted **Cross-Entropy Loss** to counteract class imbalance

# **Limitations**

- Performance may degrade on **highly informal or extremely short reviews**

- **Class imbalance** still affects predictions for underrepresented ratings

- Model was trained on **English-language** reviews only