ash001
/

imdb-sentiment-simple-rnn

+---
+language: en
+license: apache-2.0
+pipeline_tag: text-classification
+library_name: tf-keras
+tags:
+  - nlp
+  - text-classification
+  - sentiment-analysis
+  - imdb
+  - simplernn
+  - tensorflow
+  - keras
+  - streamlit
+---
+# 🎬 IMDB Movie Review Sentiment (SimpleRNN | Keras)
+A lightweight **SimpleRNN** model trained on the **Keras IMDB** dataset to predict **movie review sentiment**.
+This Hugging Face repo hosts the trained model artifact used by a Streamlit inference app.
+## Training → Model → Inference
+- **Training notebook (Colab):** https://colab.research.google.com/drive/14A_qc4aLvx5I0cFsK9lJYHRymGjzZIyK
+- **Inference app (Streamlit):** https://github.com/sparklerz/Deep-Learning-Fundamentals-Suite
+  (page: `pages/03_IMDB_Sentiment_SimpleRNN.py`)
+## What’s in this repo
+- `artifacts/simple_rnn_imdb.h5` — trained Keras model
+- `artifacts/config.json` — key inference settings:
+  - `max_features` (vocab size cap)
+  - `max_len` (sequence length)
+  - `threshold_default` (classification threshold)
+## Inputs
+- A short English movie review (free text).
+## Preprocessing (same as Streamlit app)
+- Lowercase + tokenize with regex: `[a-z']+`
+- Convert tokens to integer IDs using the **Keras IMDB word index** (`tensorflow.keras.datasets.imdb.get_word_index()`)
+- Apply the standard Keras IMDB offset:
+  - start token = `1`
+  - unknown token = `2`
+  - word indices are shifted by `+3`
+- Clip words to `max_features`; anything outside becomes `2` (unknown)
+- Pad/truncate to `max_len` using `pad_sequences` (padding="pre", truncating="post")
+## Output
+- A single probability: **P(positive)** in `[0, 1]`.
+- Decision rule:
+  - `Positive` if `P(positive) >= threshold`
+  - `Negative` otherwise
+- Default threshold is read from `artifacts/config.json` (typically `0.5`).
+## Quickstart (load + predict)
+```python
+import re
+import numpy as np
+import tensorflow as tf
+from huggingface_hub import hf_hub_download
+from tensorflow.keras.preprocessing.sequence import pad_sequences
+from tensorflow.keras.datasets import imdb
+import json
+REPO_ID = "ash001/imdb-sentiment-simple-rnn"
+# Load model + config
+model_path = hf_hub_download(REPO_ID, "artifacts/simple_rnn_imdb.h5")
+cfg_path   = hf_hub_download(REPO_ID, "artifacts/config.json")
+cfg = json.load(open(cfg_path, "r"))
+model = tf.keras.models.load_model(model_path, compile=False)
+word_index = imdb.get_word_index()
+max_features = int(cfg["max_features"])
+max_len = int(cfg["max_len"])
+threshold = float(cfg.get("threshold_default", 0.5))
+def text_to_sequence(text: str):
+    text = text.lower()
+    tokens = re.findall(r"[a-z']+", text)
+    seq = [1]  # start token
+    for w in tokens:
+        idx = word_index.get(w, 2) + 3
+        if idx >= max_features:
+            idx = 2
+        seq.append(idx)
+    return pad_sequences([seq], maxlen=max_len, truncating="post", padding="pre")
+text = "This movie was surprisingly good, with great acting and a strong ending."
+X = text_to_sequence(text)
+prob_pos = float(model.predict(X, verbose=0).reshape(-1)[0])
+label = "Positive" if prob_pos >= threshold else "Negative"
+print("P(positive) =", prob_pos, "|", label)
+````
+---
+license: apache-2.0
+---