Model weights for GPT-like decoder-only Transformer.
Configuration used for gpt_64_bs128_5000epochs_lr1.0e-03_8heads_emb512.pt:
BATCH_SIZE = 64
SEED = 42
BLOCK_SIZE = 128
EPOCHS = 5000
TRAIN_SUBSET_LENGTH = None # 10_000_000
TRAIN_PERC = 0.99
EVAL_PERIOD = 500
EVAL_ITERS = 100
EMBED_SIZE = 512
NUM_HEADS = 8
LEARNING_RATE = 1e-3
BLOCK_NUMBER = 8
Code repo: nikiandr/gpt_ua.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support