license: apache-2.0
This file is intended strictly for saving the initial weights (checkpoint) of the JiRack GPT model. The model is "clean," meaning it contains no data and has never undergone pre-training.
It is designed to be a maximum safe and robust base for starting training from scratch for specialized, smaller models, such as:
SPAM Detection Systems
FRAUD Detection Models
Background Check (BG Check) Models
A product of CMS Manhattan.
Files explanations ************* model 12 heads attation ***************** VOCAB_SIZE = 50257 MODEL_DIM = 768 NUM_HEADS = 12 NUM_LAYERS = 6 MAX_SEQ_LEN = 8192 FFN_HIDDEN_DIM = 4 * MODEL_DIM HEAD_DIM = MODEL_DIM // NUM_HEADS
JiRack_H12_L6_V50257_D768_MSL8192_FF768x4.pt
************* model 6 heads attation *****************
VOCAB_SIZE = 50257 MODEL_DIM = 768 NUM_HEADS = 6 NUM_LAYERS = 6 MAX_SEQ_LEN = 8192 FFN_HIDDEN_DIM = 4 * MODEL_DIM HEAD_DIM = MODEL_DIM // NUM_HEADS
JiRack_H6_L6_V50257_D768_MSL8192_FF768x4.pt
************* model 18 heads attation *****************