| | --- |
| | license: mit |
| | language: |
| | - en |
| | metrics: |
| | - accuracy |
| | pipeline_tag: text-classification |
| | library_name: keras |
| | --- |
| | |
| | # π IMDB Sentiment Classifier (Dual-Model) |
| |
|
| | This repository contains **two deep learning models** for sentiment classification of IMDB movie reviews, each trained with a different vocabulary size and number of parameters. |
| |
|
| | --- |
| |
|
| | ## π Dataset & Training Notes |
| |
|
| | - These models were trained on a dataset of approximately 150,000 IMDB movie reviews, which were manually scraped from the web. |
| | - The reviews were pseudo-labeled using soft probability outputs from the `cardiffnlp/twitter-roberta-base-sentiment` model. |
| | - This method provided probabilistic sentiment labels (Negative / Neutral / Positive) for training, allowing the models to learn from soft targets rather than hard class labels. |
| |
|
| | --- |
| |
|
| | ## π Dataset |
| |
|
| | - **Source:** [IMDB Multi-Movie Dataset](https://huggingface.co/datasets/Daksh0505/IMDB-Reviews) |
| |
|
| | ## Citation (Please add if you use this dataset) |
| | ```ruby |
| | @misc{imdb-multimovie-reviews, |
| | title = {IMDb Multi-Movie Review Dataset}, |
| | author = {Daksh Bhardwaj}, |
| | year = {2025}, |
| | url = {https://huggingface.co/datasets/Daksh0505/IMDB-Reviews |
| | note = {Accessed: 2025-07-17} |
| | } |
| | ``` |
| |
|
| | --- |
| |
|
| | ## π§ Models |
| |
|
| | ### πΉ Model A |
| | - Filename: `sentiment_model_imdb_6.6M.keras` |
| | - **Trainable Parameters**: ~6.6 million |
| | - **Total Parameters**: ~13.06 million |
| | - **Vocabulary Size**: 50,000 tokens |
| | - Description: Lightweight and efficient; optimized for speed. |
| |
|
| | ### πΉ Model B |
| | - Filename: `sentiment_model_imdb_34M.keras` |
| | - **Trainable Parameters**: ~34 million |
| | - **Total Parameters**: ~99.43 million |
| | - **Vocabulary Size**: 256,000 tokens |
| | - Description: Larger and more expressive; higher accuracy on nuanced reviews. |
| |
|
| | --- |
| |
|
| | ## π Tokenizers |
| |
|
| | Each model uses its own tokenizer in Keras JSON format: |
| |
|
| | - `tokenizer_50k.json` β used with Model A |
| | - `tokenizer_256k.json` β used with Model B |
| |
|
| | --- |
| |
|
| | ## π§ Load Models & Tokenizers (from Hugging Face Hub) |
| |
|
| | ```python |
| | from huggingface_hub import hf_hub_download |
| | from tensorflow.keras.models import load_model |
| | from tensorflow.keras.preprocessing.text import tokenizer_from_json |
| | import json |
| | |
| | # === Model A === |
| | model_path_a = hf_hub_download(repo_id="Daksh0505/sentiment-model-imdb", filename="sentiment_model_imdb_6.6M.keras") |
| | tokenizer_path_a = hf_hub_download(repo_id="Daksh0505/sentiment-model-imdb", filename="tokenizer_50k.json") |
| | |
| | with open(tokenizer_path_a, "r") as f: |
| | tokenizer_a = tokenizer_from_json(json.load(f)) |
| | |
| | model_a = load_model(model_path_a) |
| | |
| | # === Model B === |
| | model_path_b = hf_hub_download(repo_id="Daksh0505/sentiment-model-imdb", filename="sentiment_model_imdb_34M.keras") |
| | tokenizer_path_b = hf_hub_download(repo_id="Daksh0505/sentiment-model-imdb", filename="tokenizer_256k.json") |
| | |
| | with open(tokenizer_path_b, "r") as f: |
| | tokenizer_b = tokenizer_from_json(json.load(f)) |
| | |
| | model_b = load_model(model_path_b) |
| | ``` |
| |
|
| | ## π Try the Live Demo |
| |
|
| | Click below to test both models live in your browser: |
| |
|
| | [](https://huggingface.co/spaces/Daksh0505/sentiment-model-comparison) |