|
|
--- |
|
|
language: en |
|
|
tags: |
|
|
- text-generation |
|
|
- keras |
|
|
- tensorflow |
|
|
- lstm |
|
|
- educational |
|
|
license: other |
|
|
datasets: |
|
|
- epicurious-recipes-kaggle |
|
|
--- |
|
|
|
|
|
# LSTM Recipe Next-Token Generator |
|
|
|
|
|
## Model Summary |
|
|
This model is a next-token language model based on a Long Short-Term Memory (LSTM) network trained on the Epicurious Recipes dataset. It was developed as part of an academic assignment for **MSAI 630 – Generative AI with LLMs** and demonstrates classical sequence modeling for text generation prior to transformer-based architectures. |
|
|
|
|
|
The model predicts the next word token given a sequence of previous tokens and can be used iteratively to generate long-form recipe-style text. |
|
|
|
|
|
## Architecture |
|
|
- Embedding size: 100 |
|
|
- LSTM hidden units: 128 |
|
|
- Vocabulary size: 10,000 |
|
|
- Maximum sequence length: 200 tokens |
|
|
- Output: Softmax over vocabulary (next-token prediction) |
|
|
|
|
|
Framework: TensorFlow / Keras |
|
|
|
|
|
## Training Details |
|
|
- Dataset: Epicurious Recipes (Kaggle) |
|
|
- Objective: Next-token prediction using sparse categorical cross-entropy |
|
|
- Optimizer: Adam |
|
|
- Epochs: 25 |
|
|
- Batch size: 32 |
|
|
- Random seed: 42 |
|
|
|
|
|
Text preprocessing includes: |
|
|
- Lowercasing |
|
|
- Explicit punctuation token separation |
|
|
- Integer tokenization via `TextVectorization` |
|
|
|
|
|
The original dataset is **not redistributed** in this repository. |
|
|
|
|
|
## Usage |
|
|
This model is intended for educational and experimental purposes. |
|
|
|
|
|
Typical usage: |
|
|
1. Tokenize an input prompt using the same vocabulary |
|
|
2. Feed the token sequence to the model |
|
|
3. Sample the next token from the output distribution |
|
|
4. Append the token and repeat |
|
|
|
|
|
Temperature sampling can be applied to control randomness during generation. |
|
|
|
|
|
## Limitations |
|
|
- LSTM-based models have limited long-range coherence |
|
|
- Outputs may become repetitive over long generations |
|
|
- Not suitable for real-world recipe advice or food safety guidance |
|
|
- Trained on a static dataset with no factual grounding |
|
|
|
|
|
## Intended Use |
|
|
- Educational demonstrations of sequence modeling |
|
|
- Classical NLP comparison against transformer models |
|
|
- Coursework and portfolio showcase |
|
|
|
|
|
## Author |
|
|
Zane Graper |
|
|
|