๐ง Training Configuration
This model was trained using the model architecture:
โ๏ธ Model Architecture
| Parameter |
Value |
Description |
d_model |
768 |
Hidden dimension size |
n_layers |
12 |
Number of Transformer decoder layers |
n_heads |
12 |
Attention heads per layer |
max_len |
256 |
Maximum token sequence length |
dropout |
0.1 |
Dropout rate |
use_rope |
True |
Rotary positional embeddings enabled |
tie_weights |
True |
Output and embedding weights are shared |
| Total Parameters |
โ 136 million |
(comparable to GPT-Neo-125M) |
๐งพ Optimization & Training
| Parameter |
Value |
Description |
optimizer |
AdamW |
Standard optimizer for Transformers |
lr |
3e-4 |
Learning rate |
weight_decay |
0.01 |
L2 regularization |
betas |
(0.9, 0.95) |
Momentum coefficients |
grad_clip |
1.0 |
Gradient clipping threshold |
epochs |
15 |
Training epochs |
scheduler |
Cosine |
Learning rate scheduler |
warmup_steps |
auto |
Automatically computed warmup steps |
mixed_precision |
True |
FP16 mixed precision training |
grad_accum_steps |
1 |
Gradient accumulation steps |
๐ Dataset
| Source |
Description |
| RecipeNLG CSV |
A cleaned and structured dataset of recipes containing ingredients, titles, and step-by-step instructions. |
| Tokens processed |
~2 million |
| Pre-processing |
Lowercasing, ingredient quantity normalization, Unicode cleaning, removal of duplicates, and BPE tokenization (vocab โ 8 k). |
๐ข Tokenizer
- Type: Byte-Pair Encoding (BPE)
- Path:
tokenizer/bpe.json
- Vocabulary size: matches training tokenizer
- Special tokens:
<bos>, <eos> handled explicitly during generation
๐งฎ Compute Environment
| Setting |
Value |
| Platform |
Google Colab Pro+ |
| GPU |
NVIDIA A100 (80 GB) |
| Runtime |
PyTorch 2.x |
| Training time |
~18-22 hours |
| Mixed precision |
Enabled (AMP) |
๐งฎ Hardware & Reproducibility
- Seed: 42
- Device: Auto (GPU if available)
- Mixed precision: Enabled (FP16)
- Trained for 3 epochs on Colab Pro+ with T4/A100 GPU.
๐งฉ Summary
Scraps-LLM is a 138 M-parameter decoder-only Transformer trained to generate complete cooking recipes from a list of input ingredients.
The model learns via causal language modeling (next-token prediction) on RecipeNLG data, producing structured, human-readable recipes that include titles and numbered steps.
It was later exported to ONNX for lightweight CPU inference and integrated into a Hugging Face Space demo.
๐ Evaluation (dev set)
| Metric |
Value |
| Validation Perplexity |
~7.1 |
| BLEU-1 |
37.1 |
| ROUGE-L |
6.7 |
๐งฐ Files Included
| File |
Description |
export/scraps.onnx |
ONNX-optimized inference graph |
tokenizer/bpe.json |
BPE vocabulary used for encoding/decoding |
best_model.pt |
PyTorch checkpoint (~138 M parameters) |