YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

model_name: IMDb Movie Sentiment Classifier version: 1.0.0 license: MIT architecture: CNN (Convolutional Neural Network) framework: TensorFlow / Keras language: English tags:

sentiment-analysis
binary-classification
deep-learning
movie-reviews
keras
streamlit-app
tensorboard

description: | A convolutional neural network (CNN) based sentiment analysis model trained on the Stanford IMDb (aclImdb) dataset. The model classifies movie reviews as either positive or negative. It is integrated with a Streamlit app for real-time predictions, and logs training metrics using TensorBoard.

datasets: name: Stanford IMDb (aclImdb) source: https://ai.stanford.edu/~amaas/data/sentiment/ size: 50,000 reviews (25,000 positive, 25,000 negative) split: train: 80% validation: 20% preprocessing: - Lowercasing - HTML tag removal - Punctuation and extra space stripping - Tokenization and padding to 500 words max

training: vocab_size: 30000 max_sequence_length: 500 embedding_dim: 128 optimizer: Adam loss_function: Binary Crossentropy epochs: 10 batch_size: 64 class_weighting: true callbacks: - EarlyStopping - ReduceLROnPlateau - TensorBoard

metrics: accuracy: ~0.68 precision: class_0: ~0.88 class_1: ~0.62 recall: class_0: ~0.42 class_1: ~0.94 f1_score: class_0: ~0.57 class_1: ~0.75 macro_avg_f1: ~0.66 weighted_avg_f1: ~0.66

limitations: |

The model underperforms on class 0 (negative sentiment), with low recall.
CNNs do not capture long-range dependencies well; consider using BiLSTM or transformer-based models (e.g. BERT).
Performance may degrade on noisy or sarcastic text.

usage: | Load models/imdb_cnn_model.h5 and tokenizer.pkl to predict sentiment of new reviews. The Streamlit app in app/app.py provides a web interface. To run: streamlit run app/app.py

authors:

name: Voltsy role: Developer and Trainer contact: jorgecreiannj@gmail.com

last_updated: 2025-06-29

🎬 Movie Review Sentiment Analysis (IMDb Dataset)

This project uses a Convolutional Neural Network (CNN) to classify movie reviews from the Stanford IMDb Dataset (aclImdb) as positive or negative. The application includes:

✅ TensorFlow + Keras model
✅ Streamlit interface for real-time prediction
✅ TensorBoard for training monitoring
✅ Cleaned and preprocessed text data
✅ Class-weighted training to improve recall

📦 Project Structure

MovieSentimentAnalysis/
├── app/ # Streamlit app
│ └── app.py
├── data/ # Preprocessed CSVs
│ └── imdb_train.csv
├── logs/ # TensorBoard logs
├── models/ # Saved model + tokenizer
│ ├── imdb_cnn_model.h5
│ └── tokenizer.pkl
├── train.py # Model training script
├── evaluate.py # Evaluation script (confusion matrix, metrics)
├── requirements.txt
├── README.md
└── modelcard.yaml # Metadata about the model

🚀 How to Run

1. Install dependencies

pip install -r requirements.txt

Train the model

python train.py

Evaluate performance

python evaluate.py

Run the Streamlit app

streamlit run app/app.py

🧪 Model Details Architecture: Embedding → Conv1D → GlobalMaxPool → Dense

Tokenizer: Keras Tokenizer with top 30,000 words, padded to maxlen=500

Metrics:

Accuracy: ~68–75%

F1 (Positive): ~0.75

F1 (Negative): ~0.56

Limitations:

May underperform on sarcastic/ambiguous reviews

CNN alone struggles with long-range dependencies

📈 TensorBoard

tensorboard --logdir logs

Access it at: http://localhost:6006

📊 Dataset Source: aclImdb Dataset

Train Size: 25k positive + 25k negative reviews

Balanced: Yes

🔒 License MIT License

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support