๐Ÿง  Training Configuration

This model was trained using the model architecture:

โš™๏ธ Model Architecture

Parameter Value Description
d_model 768 Hidden dimension size
n_layers 12 Number of Transformer decoder layers
n_heads 12 Attention heads per layer
max_len 256 Maximum token sequence length
dropout 0.1 Dropout rate
use_rope True Rotary positional embeddings enabled
tie_weights True Output and embedding weights are shared
Total Parameters โ‰ˆ 136 million (comparable to GPT-Neo-125M)

๐Ÿงพ Optimization & Training

Parameter Value Description
optimizer AdamW Standard optimizer for Transformers
lr 3e-4 Learning rate
weight_decay 0.01 L2 regularization
betas (0.9, 0.95) Momentum coefficients
grad_clip 1.0 Gradient clipping threshold
epochs 15 Training epochs
scheduler Cosine Learning rate scheduler
warmup_steps auto Automatically computed warmup steps
mixed_precision True FP16 mixed precision training
grad_accum_steps 1 Gradient accumulation steps

๐Ÿ“š Dataset

Source Description
RecipeNLG CSV A cleaned and structured dataset of recipes containing ingredients, titles, and step-by-step instructions.
Tokens processed ~2 million
Pre-processing Lowercasing, ingredient quantity normalization, Unicode cleaning, removal of duplicates, and BPE tokenization (vocab โ‰ˆ 8 k).

๐Ÿ”ข Tokenizer

  • Type: Byte-Pair Encoding (BPE)
  • Path: tokenizer/bpe.json
  • Vocabulary size: matches training tokenizer
  • Special tokens: <bos>, <eos> handled explicitly during generation

๐Ÿงฎ Compute Environment

Setting Value
Platform Google Colab Pro+
GPU NVIDIA A100 (80 GB)
Runtime PyTorch 2.x
Training time ~18-22 hours
Mixed precision Enabled (AMP)

๐Ÿงฎ Hardware & Reproducibility

  • Seed: 42
  • Device: Auto (GPU if available)
  • Mixed precision: Enabled (FP16)
  • Trained for 3 epochs on Colab Pro+ with T4/A100 GPU.

๐Ÿงฉ Summary

Scraps-LLM is a 138 M-parameter decoder-only Transformer trained to generate complete cooking recipes from a list of input ingredients.
The model learns via causal language modeling (next-token prediction) on RecipeNLG data, producing structured, human-readable recipes that include titles and numbered steps.
It was later exported to ONNX for lightweight CPU inference and integrated into a Hugging Face Space demo.


๐Ÿ“ˆ Evaluation (dev set)

Metric Value
Validation Perplexity ~7.1
BLEU-1 37.1
ROUGE-L 6.7

๐Ÿงฐ Files Included

File Description
export/scraps.onnx ONNX-optimized inference graph
tokenizer/bpe.json BPE vocabulary used for encoding/decoding
best_model.pt PyTorch checkpoint (~138 M parameters)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using donribbs/scraps-llm-model 1