π Hamlet Next-Word Prediction (LSTM | Keras)
A lightweight LSTM language model trained to predict the next word from a short text prompt (Hamlet-style).
This Hugging Face repo hosts the trained model + tokenizer used by a Streamlit inference app.
Training β Model β Inference
- Training notebook (Colab): https://colab.research.google.com/drive/1Hh7BKYroKbbZnMxAQ8R6mzsIW1xawaP6
- Inference app (Streamlit): https://github.com/sparklerz/Deep-Learning-Fundamentals-Suite
(page:pages/04_Hamlet_Next_Word_LSTM.py)
Whatβs in this repo
artifacts/next_word_lstm.h5β trained Keras modelartifacts/tokenizer.pickleβ fitted Keras Tokenizerartifacts/config.jsonβ generation/config values (e.g.,max_sequence_len, vocab cap)hamlet.txtβ training text used in the notebook
Inputs
- Seed text (English).
- The text is tokenized with the saved tokenizer and padded to a fixed context length:
max_sequence_len = 40(fromartifacts/config.json)
Output
- Next-word probabilities over the vocabulary.
- In the Streamlit app you can:
- show top-k next-word suggestions
- generate multiple words using top-k + top-p (nucleus) sampling with temperature and a small repeat penalty
Quickstart (load + predict next word)
import pickle
import numpy as np
import tensorflow as tf
from huggingface_hub import hf_hub_download
from tensorflow.keras.preprocessing.sequence import pad_sequences
REPO_ID = "ash001/hamlet-nextword-lstm"
# Download artifacts
model_path = hf_hub_download(REPO_ID, "artifacts/next_word_lstm.h5")
tok_path = hf_hub_download(REPO_ID, "artifacts/tokenizer.pickle")
cfg_path = hf_hub_download(REPO_ID, "artifacts/config.json")
model = tf.keras.models.load_model(model_path, compile=False)
with open(tok_path, "rb") as f:
tokenizer = pickle.load(f)
import json
cfg = json.load(open(cfg_path, "r"))
max_sequence_len = int(cfg["max_sequence_len"])
def next_word_topk(seed_text: str, k: int = 10):
token_list = tokenizer.texts_to_sequences([seed_text])[0]
token_list = pad_sequences([token_list], maxlen=max_sequence_len - 1, padding="pre")
probs = model.predict(token_list, verbose=0)[0]
top_idx = np.argsort(probs)[-k:][::-1]
return [(tokenizer.index_word.get(int(i), ""), float(probs[i])) for i in top_idx]
print(next_word_topk("what a piece of work", k=10))
license: apache-2.0
- Downloads last month
- -