YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

model_name: IMDb Movie Sentiment Classifier version: 1.0.0 license: MIT architecture: CNN (Convolutional Neural Network) framework: TensorFlow / Keras language: English tags:

  • sentiment-analysis
  • binary-classification
  • deep-learning
  • movie-reviews
  • keras
  • streamlit-app
  • tensorboard

description: | A convolutional neural network (CNN) based sentiment analysis model trained on the Stanford IMDb (aclImdb) dataset. The model classifies movie reviews as either positive or negative. It is integrated with a Streamlit app for real-time predictions, and logs training metrics using TensorBoard.

datasets: name: Stanford IMDb (aclImdb) source: https://ai.stanford.edu/~amaas/data/sentiment/ size: 50,000 reviews (25,000 positive, 25,000 negative) split: train: 80% validation: 20% preprocessing: - Lowercasing - HTML tag removal - Punctuation and extra space stripping - Tokenization and padding to 500 words max

training: vocab_size: 30000 max_sequence_length: 500 embedding_dim: 128 optimizer: Adam loss_function: Binary Crossentropy epochs: 10 batch_size: 64 class_weighting: true callbacks: - EarlyStopping - ReduceLROnPlateau - TensorBoard

metrics: accuracy: ~0.68 precision: class_0: ~0.88 class_1: ~0.62 recall: class_0: ~0.42 class_1: ~0.94 f1_score: class_0: ~0.57 class_1: ~0.75 macro_avg_f1: ~0.66 weighted_avg_f1: ~0.66

limitations: |

  • The model underperforms on class 0 (negative sentiment), with low recall.
  • CNNs do not capture long-range dependencies well; consider using BiLSTM or transformer-based models (e.g. BERT).
  • Performance may degrade on noisy or sarcastic text.

usage: | Load models/imdb_cnn_model.h5 and tokenizer.pkl to predict sentiment of new reviews. The Streamlit app in app/app.py provides a web interface. To run: streamlit run app/app.py

authors:

last_updated: 2025-06-29

🎬 Movie Review Sentiment Analysis (IMDb Dataset)

This project uses a Convolutional Neural Network (CNN) to classify movie reviews from the Stanford IMDb Dataset (aclImdb) as positive or negative. The application includes:

  • βœ… TensorFlow + Keras model
  • βœ… Streamlit interface for real-time prediction
  • βœ… TensorBoard for training monitoring
  • βœ… Cleaned and preprocessed text data
  • βœ… Class-weighted training to improve recall

πŸ“¦ Project Structure

MovieSentimentAnalysis/
β”œβ”€β”€ app/ # Streamlit app
β”‚ └── app.py
β”œβ”€β”€ data/ # Preprocessed CSVs
β”‚ └── imdb_train.csv
β”œβ”€β”€ logs/ # TensorBoard logs
β”œβ”€β”€ models/ # Saved model + tokenizer
β”‚ β”œβ”€β”€ imdb_cnn_model.h5
β”‚ └── tokenizer.pkl
β”œβ”€β”€ train.py # Model training script
β”œβ”€β”€ evaluate.py # Evaluation script (confusion matrix, metrics)
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ README.md
└── modelcard.yaml # Metadata about the model

πŸš€ How to Run

1. Install dependencies

pip install -r requirements.txt
  1. Train the model
python train.py
  1. Evaluate performance
python evaluate.py
  1. Run the Streamlit app
streamlit run app/app.py

πŸ§ͺ Model Details Architecture: Embedding β†’ Conv1D β†’ GlobalMaxPool β†’ Dense

Tokenizer: Keras Tokenizer with top 30,000 words, padded to maxlen=500

Metrics:

Accuracy: ~68–75%

F1 (Positive): ~0.75

F1 (Negative): ~0.56

Limitations:

May underperform on sarcastic/ambiguous reviews

CNN alone struggles with long-range dependencies

πŸ“ˆ TensorBoard

tensorboard --logdir logs

Access it at: http://localhost:6006

πŸ“Š Dataset Source: aclImdb Dataset

Train Size: 25k positive + 25k negative reviews

Balanced: Yes

πŸ”’ License MIT License

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support