Update README.md

19d924c verified 5 months ago

3.36 kB

language: en
tags:
  - decoder-transformer
  - recipe-generation
  - scraps-llm
  - onnx
license: mit
library_name: pytorch
pipeline_tag: text-generation

🧠 Training Configuration

This model was trained using the model architecture:

⚙️ Model Architecture

Parameter	Value	Description
`d_model`	768	Hidden dimension size
`n_layers`	12	Number of Transformer decoder layers
`n_heads`	12	Attention heads per layer
`max_len`	256	Maximum token sequence length
`dropout`	0.1	Dropout rate
`use_rope`	True	Rotary positional embeddings enabled
`tie_weights`	True	Output and embedding weights are shared
Total Parameters	≈ 136 million	(comparable to GPT-Neo-125M)

🧾 Optimization & Training

Parameter	Value	Description
`optimizer`	AdamW	Standard optimizer for Transformers
`lr`	3e-4	Learning rate
`weight_decay`	0.01	L2 regularization
`betas`	(0.9, 0.95)	Momentum coefficients
`grad_clip`	1.0	Gradient clipping threshold
`epochs`	15	Training epochs
`scheduler`	Cosine	Learning rate scheduler
`warmup_steps`	auto	Automatically computed warmup steps
`mixed_precision`	True	FP16 mixed precision training
`grad_accum_steps`	1	Gradient accumulation steps

📚 Dataset

Source	Description
RecipeNLG CSV	A cleaned and structured dataset of recipes containing ingredients, titles, and step-by-step instructions.
Tokens processed	~2 million
Pre-processing	Lowercasing, ingredient quantity normalization, Unicode cleaning, removal of duplicates, and BPE tokenization (vocab ≈ 8 k).

🔢 Tokenizer

Type: Byte-Pair Encoding (BPE)
Path: tokenizer/bpe.json
Vocabulary size: matches training tokenizer
Special tokens: <bos>, <eos> handled explicitly during generation

🧮 Compute Environment

Setting	Value
Platform	Google Colab Pro+
GPU	NVIDIA A100 (80 GB)
Runtime	PyTorch 2.x
Training time	~18-22 hours
Mixed precision	Enabled (AMP)

🧮 Hardware & Reproducibility

Seed: 42
Device: Auto (GPU if available)
Mixed precision: Enabled (FP16)
Trained for 3 epochs on Colab Pro+ with T4/A100 GPU.

🧩 Summary

Scraps-LLM is a 138 M-parameter decoder-only Transformer trained to generate complete cooking recipes from a list of input ingredients.
The model learns via causal language modeling (next-token prediction) on RecipeNLG data, producing structured, human-readable recipes that include titles and numbered steps.
It was later exported to ONNX for lightweight CPU inference and integrated into a Hugging Face Space demo.

📈 Evaluation (dev set)

Metric	Value
Validation Perplexity	~7.1
BLEU-1	37.1
ROUGE-L	6.7

🧰 Files Included

File	Description
`export/scraps.onnx`	ONNX-optimized inference graph
`tokenizer/bpe.json`	BPE vocabulary used for encoding/decoding
`best_model.pt`	PyTorch checkpoint (~138 M parameters)