YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
model_name: IMDb Movie Sentiment Classifier version: 1.0.0 license: MIT architecture: CNN (Convolutional Neural Network) framework: TensorFlow / Keras language: English tags:
- sentiment-analysis
- binary-classification
- deep-learning
- movie-reviews
- keras
- streamlit-app
- tensorboard
description: | A convolutional neural network (CNN) based sentiment analysis model trained on the Stanford IMDb (aclImdb) dataset. The model classifies movie reviews as either positive or negative. It is integrated with a Streamlit app for real-time predictions, and logs training metrics using TensorBoard.
datasets: name: Stanford IMDb (aclImdb) source: https://ai.stanford.edu/~amaas/data/sentiment/ size: 50,000 reviews (25,000 positive, 25,000 negative) split: train: 80% validation: 20% preprocessing: - Lowercasing - HTML tag removal - Punctuation and extra space stripping - Tokenization and padding to 500 words max
training: vocab_size: 30000 max_sequence_length: 500 embedding_dim: 128 optimizer: Adam loss_function: Binary Crossentropy epochs: 10 batch_size: 64 class_weighting: true callbacks: - EarlyStopping - ReduceLROnPlateau - TensorBoard
metrics: accuracy: ~0.68 precision: class_0: ~0.88 class_1: ~0.62 recall: class_0: ~0.42 class_1: ~0.94 f1_score: class_0: ~0.57 class_1: ~0.75 macro_avg_f1: ~0.66 weighted_avg_f1: ~0.66
limitations: |
- The model underperforms on class 0 (negative sentiment), with low recall.
- CNNs do not capture long-range dependencies well; consider using BiLSTM or transformer-based models (e.g. BERT).
- Performance may degrade on noisy or sarcastic text.
usage: |
Load models/imdb_cnn_model.h5 and tokenizer.pkl to predict sentiment of new reviews.
The Streamlit app in app/app.py provides a web interface.
To run:
streamlit run app/app.py
authors:
- name: Voltsy role: Developer and Trainer contact: jorgecreiannj@gmail.com
last_updated: 2025-06-29
π¬ Movie Review Sentiment Analysis (IMDb Dataset)
This project uses a Convolutional Neural Network (CNN) to classify movie reviews from the Stanford IMDb Dataset (aclImdb) as positive or negative. The application includes:
- β TensorFlow + Keras model
- β Streamlit interface for real-time prediction
- β TensorBoard for training monitoring
- β Cleaned and preprocessed text data
- β Class-weighted training to improve recall
π¦ Project Structure
MovieSentimentAnalysis/
βββ app/ # Streamlit app
β βββ app.py
βββ data/ # Preprocessed CSVs
β βββ imdb_train.csv
βββ logs/ # TensorBoard logs
βββ models/ # Saved model + tokenizer
β βββ imdb_cnn_model.h5
β βββ tokenizer.pkl
βββ train.py # Model training script
βββ evaluate.py # Evaluation script (confusion matrix, metrics)
βββ requirements.txt
βββ README.md
βββ modelcard.yaml # Metadata about the model
π How to Run
1. Install dependencies
pip install -r requirements.txt
- Train the model
python train.py
- Evaluate performance
python evaluate.py
- Run the Streamlit app
streamlit run app/app.py
π§ͺ Model Details Architecture: Embedding β Conv1D β GlobalMaxPool β Dense
Tokenizer: Keras Tokenizer with top 30,000 words, padded to maxlen=500
Metrics:
Accuracy: ~68β75%
F1 (Positive): ~0.75
F1 (Negative): ~0.56
Limitations:
May underperform on sarcastic/ambiguous reviews
CNN alone struggles with long-range dependencies
π TensorBoard
tensorboard --logdir logs
Access it at: http://localhost:6006
π Dataset Source: aclImdb Dataset
Train Size: 25k positive + 25k negative reviews
Balanced: Yes
π License MIT License