rhdang
/

Yelp_Review

Text Classification

Model card Files Files and versions

Yelp_Review / README.md

rhdang's picture

Update README.md

38b8857 verified about 1 year ago

|

history blame contribute delete

1.25 kB

	---
	license: mit
	language:
	- en
	library_name: transformers
	tags:
	- text-classification
	- yelp-reviews
	- gpt-2
	- bert
	---

	# Model Description

	This model predicts the star rating (1 - 5) of a Yelp review based on its text content. It was trained using GPT-2 and BERT, with BERT achieving the best performance at 75% validation accuracy. The model addresses class imbalance using weighted loss and optimizes hyperparameters to enhance generalization.

	# Training Details

	- Dataset: Yelp Reviews dataset (100,000 samples used)

	- Preprocessing:

	- GPT-2 Tokenizer with Byte-Pair Encoding (BPE) for rare words
	- Truncation (128 tokens) and padding for uniform input size

	- Models Trained:

	- GPT-2: Fine-tuned with a custom classification head, achieving 67% validation accuracy

	- BERT: Fine-tuned with bidirectional attention, achieving 75% validation accuracy

	- Loss Function: Weighted Cross-Entropy Loss to counteract class imbalance

	# Limitations

	- Performance may degrade on highly informal or extremely short reviews

	- Class imbalance still affects predictions for underrepresented ratings

	- Model was trained on English-language reviews only