rhdang
/

Yelp_Review

Text Classification

Model card Files Files and versions

rhdang commited on Feb 10, 2025

Commit

38b8857

·

verified ·

1 Parent(s): 6d2c94f

Update README.md

Files changed (1) hide show

README.md +18 -11

README.md CHANGED Viewed

@@ -10,24 +10,31 @@ tags:
 - bert
 ---
-Model Description
-This model predicts the star rating (1-5) of a Yelp review based on its text content. It was trained using GPT-2 and BERT, with BERT achieving the best performance at 75% validation accuracy. The model addresses class imbalance using weighted loss and optimizes hyperparameters to enhance generalization.
-Training Details
-Dataset: Yelp Reviews dataset (100,000 samples used)
-Preprocessing:
-GPT-2 Tokenizer with Byte-Pair Encoding (BPE) for rare words
-Truncation (128 tokens) and padding for uniform input size
-Models Trained:
-GPT-2: Fine-tuned with a custom classification head, achieving 67% validation accuracy
-BERT: Fine-tuned with bidirectional attention, achieving 75% validation accuracy
-Loss Function: Weighted Cross-Entropy Loss to counteract class imbalance

 - bert
 ---
+# **Model Description**
+This model predicts the star rating (1 - 5) of a Yelp review based on its text content. It was trained using **GPT-2** and **BERT**, with **BERT** achieving the best performance at **75%** validation accuracy. The model addresses class imbalance using weighted loss and optimizes hyperparameters to enhance generalization.
+# **Training Details**
+- **Dataset**: Yelp Reviews dataset (100,000 samples used)
+- **Preprocessing**:
+  - **GPT-2 Tokenizer** with **Byte-Pair Encoding (BPE)** for rare words
+  - Truncation (128 tokens) and padding for uniform input size
+- **Models Trained**:
+  - **GPT-2**: Fine-tuned with a custom classification head, achieving **67% validation accuracy**
+  - **BERT**: Fine-tuned with bidirectional attention, achieving **75% validation accuracy**
+- **Loss Function**: Weighted **Cross-Entropy Loss** to counteract class imbalance
+# **Limitations**
+- Performance may degrade on **highly informal or extremely short reviews**
+- **Class imbalance** still affects predictions for underrepresented ratings
+- Model was trained on **English-language** reviews only