| name tiny_64x2 | device cuda | compile True | data_dir data/tiny_stories_10m | should_randomize True | log_interval 10 | eval_interval 250 | eval_steps 100 | batch_size 128 | gradient_accumulation_steps 8 | learning_rate 0.001 | warmup_steps 0 | max_steps 5000 | decay_lr False | min_lr 0 | weight_decay 0.1 | grad_clip 1.0 | gpt_config {'name': 'tiktoken_64x2', 'device': device(type='cuda'), 'compile': True, 'block_size': 128, 'vocab_size': 50257, 'n_layer': 2, 'n_head': 16, 'n_embd': 64} | |