metadata
language: en
tags:
- decoder-transformer
- recipe-generation
- scraps-llm
- onnx
license: mit
library_name: pytorch
pipeline_tag: text-generation
🧠 Training Configuration
This model was trained using the model architecture:
⚙️ Model Architecture
| Parameter | Value | Description |
|---|---|---|
d_model |
768 | Hidden dimension size |
n_layers |
12 | Number of Transformer decoder layers |
n_heads |
12 | Attention heads per layer |
max_len |
256 | Maximum token sequence length |
dropout |
0.1 | Dropout rate |
use_rope |
True | Rotary positional embeddings enabled |
tie_weights |
True | Output and embedding weights are shared |
| Total Parameters | ≈ 136 million | (comparable to GPT-Neo-125M) |
🧾 Optimization & Training
| Parameter | Value | Description |
|---|---|---|
optimizer |
AdamW | Standard optimizer for Transformers |
lr |
3e-4 | Learning rate |
weight_decay |
0.01 | L2 regularization |
betas |
(0.9, 0.95) | Momentum coefficients |
grad_clip |
1.0 | Gradient clipping threshold |
epochs |
15 | Training epochs |
scheduler |
Cosine | Learning rate scheduler |
warmup_steps |
auto | Automatically computed warmup steps |
mixed_precision |
True | FP16 mixed precision training |
grad_accum_steps |
1 | Gradient accumulation steps |
📚 Dataset
| Source | Description |
|---|---|
| RecipeNLG CSV | A cleaned and structured dataset of recipes containing ingredients, titles, and step-by-step instructions. |
| Tokens processed | ~2 million |
| Pre-processing | Lowercasing, ingredient quantity normalization, Unicode cleaning, removal of duplicates, and BPE tokenization (vocab ≈ 8 k). |
🔢 Tokenizer
- Type: Byte-Pair Encoding (BPE)
- Path:
tokenizer/bpe.json - Vocabulary size: matches training tokenizer
- Special tokens:
<bos>,<eos>handled explicitly during generation
🧮 Compute Environment
| Setting | Value |
|---|---|
| Platform | Google Colab Pro+ |
| GPU | NVIDIA A100 (80 GB) |
| Runtime | PyTorch 2.x |
| Training time | ~18-22 hours |
| Mixed precision | Enabled (AMP) |
🧮 Hardware & Reproducibility
- Seed: 42
- Device: Auto (GPU if available)
- Mixed precision: Enabled (FP16)
- Trained for 3 epochs on Colab Pro+ with T4/A100 GPU.
🧩 Summary
Scraps-LLM is a 138 M-parameter decoder-only Transformer trained to generate complete cooking recipes from a list of input ingredients.
The model learns via causal language modeling (next-token prediction) on RecipeNLG data, producing structured, human-readable recipes that include titles and numbered steps.
It was later exported to ONNX for lightweight CPU inference and integrated into a Hugging Face Space demo.
📈 Evaluation (dev set)
| Metric | Value |
|---|---|
| Validation Perplexity | ~7.1 |
| BLEU-1 | 37.1 |
| ROUGE-L | 6.7 |
🧰 Files Included
| File | Description |
|---|---|
export/scraps.onnx |
ONNX-optimized inference graph |
tokenizer/bpe.json |
BPE vocabulary used for encoding/decoding |
best_model.pt |
PyTorch checkpoint (~138 M parameters) |