Daksh0505's picture
Update README.md
34ed475 verified
---
license: mit
language:
- en
metrics:
- accuracy
pipeline_tag: text-classification
library_name: keras
---
# πŸ“š IMDB Sentiment Classifier (Dual-Model)
This repository contains **two deep learning models** for sentiment classification of IMDB movie reviews, each trained with a different vocabulary size and number of parameters.
---
## πŸ“‚ Dataset & Training Notes
- These models were trained on a dataset of approximately 150,000 IMDB movie reviews, which were manually scraped from the web.
- The reviews were pseudo-labeled using soft probability outputs from the `cardiffnlp/twitter-roberta-base-sentiment` model.
- This method provided probabilistic sentiment labels (Negative / Neutral / Positive) for training, allowing the models to learn from soft targets rather than hard class labels.
---
## πŸ“ Dataset
- **Source:** [IMDB Multi-Movie Dataset](https://huggingface.co/datasets/Daksh0505/IMDB-Reviews)
## Citation (Please add if you use this dataset)
```ruby
@misc{imdb-multimovie-reviews,
title = {IMDb Multi-Movie Review Dataset},
author = {Daksh Bhardwaj},
year = {2025},
url = {https://huggingface.co/datasets/Daksh0505/IMDB-Reviews
note = {Accessed: 2025-07-17}
}
```
---
## 🧠 Models
### πŸ”Ή Model A
- Filename: `sentiment_model_imdb_6.6M.keras`
- **Trainable Parameters**: ~6.6 million
- **Total Parameters**: ~13.06 million
- **Vocabulary Size**: 50,000 tokens
- Description: Lightweight and efficient; optimized for speed.
### πŸ”Ή Model B
- Filename: `sentiment_model_imdb_34M.keras`
- **Trainable Parameters**: ~34 million
- **Total Parameters**: ~99.43 million
- **Vocabulary Size**: 256,000 tokens
- Description: Larger and more expressive; higher accuracy on nuanced reviews.
---
## πŸ—‚ Tokenizers
Each model uses its own tokenizer in Keras JSON format:
- `tokenizer_50k.json` β†’ used with Model A
- `tokenizer_256k.json` β†’ used with Model B
---
## πŸ”§ Load Models & Tokenizers (from Hugging Face Hub)
```python
from huggingface_hub import hf_hub_download
from tensorflow.keras.models import load_model
from tensorflow.keras.preprocessing.text import tokenizer_from_json
import json
# === Model A ===
model_path_a = hf_hub_download(repo_id="Daksh0505/sentiment-model-imdb", filename="sentiment_model_imdb_6.6M.keras")
tokenizer_path_a = hf_hub_download(repo_id="Daksh0505/sentiment-model-imdb", filename="tokenizer_50k.json")
with open(tokenizer_path_a, "r") as f:
tokenizer_a = tokenizer_from_json(json.load(f))
model_a = load_model(model_path_a)
# === Model B ===
model_path_b = hf_hub_download(repo_id="Daksh0505/sentiment-model-imdb", filename="sentiment_model_imdb_34M.keras")
tokenizer_path_b = hf_hub_download(repo_id="Daksh0505/sentiment-model-imdb", filename="tokenizer_256k.json")
with open(tokenizer_path_b, "r") as f:
tokenizer_b = tokenizer_from_json(json.load(f))
model_b = load_model(model_path_b)
```
## πŸš€ Try the Live Demo
Click below to test both models live in your browser:
[![Open in Spaces](https://img.shields.io/badge/πŸ€—%20Hugging%20Face-Sentiment%20Demo-blue?logo=streamlit&style=for-the-badge)](https://huggingface.co/spaces/Daksh0505/sentiment-model-comparison)