πŸ“œ Hamlet Next-Word Prediction (LSTM | Keras)

A lightweight LSTM language model trained to predict the next word from a short text prompt (Hamlet-style).
This Hugging Face repo hosts the trained model + tokenizer used by a Streamlit inference app.

Training β†’ Model β†’ Inference

What’s in this repo

  • artifacts/next_word_lstm.h5 β€” trained Keras model
  • artifacts/tokenizer.pickle β€” fitted Keras Tokenizer
  • artifacts/config.json β€” generation/config values (e.g., max_sequence_len, vocab cap)
  • hamlet.txt β€” training text used in the notebook

Inputs

  • Seed text (English).
  • The text is tokenized with the saved tokenizer and padded to a fixed context length:
    • max_sequence_len = 40 (from artifacts/config.json)

Output

  • Next-word probabilities over the vocabulary.
  • In the Streamlit app you can:
    • show top-k next-word suggestions
    • generate multiple words using top-k + top-p (nucleus) sampling with temperature and a small repeat penalty

Quickstart (load + predict next word)

import pickle
import numpy as np
import tensorflow as tf
from huggingface_hub import hf_hub_download
from tensorflow.keras.preprocessing.sequence import pad_sequences

REPO_ID = "ash001/hamlet-nextword-lstm"

# Download artifacts
model_path = hf_hub_download(REPO_ID, "artifacts/next_word_lstm.h5")
tok_path   = hf_hub_download(REPO_ID, "artifacts/tokenizer.pickle")
cfg_path   = hf_hub_download(REPO_ID, "artifacts/config.json")

model = tf.keras.models.load_model(model_path, compile=False)
with open(tok_path, "rb") as f:
    tokenizer = pickle.load(f)

import json
cfg = json.load(open(cfg_path, "r"))
max_sequence_len = int(cfg["max_sequence_len"])

def next_word_topk(seed_text: str, k: int = 10):
    token_list = tokenizer.texts_to_sequences([seed_text])[0]
    token_list = pad_sequences([token_list], maxlen=max_sequence_len - 1, padding="pre")
    probs = model.predict(token_list, verbose=0)[0]
    top_idx = np.argsort(probs)[-k:][::-1]
    return [(tokenizer.index_word.get(int(i), ""), float(probs[i])) for i in top_idx]

print(next_word_topk("what a piece of work", k=10))

license: apache-2.0

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support