JiRack_empty / README.md
kgrabko's picture
Update README.md
9645ef0 verified
|
raw
history blame
1.04 kB
metadata
license: apache-2.0

This file is intended strictly for saving the initial weights (checkpoint) of the JiRack GPT model. The model is "clean," meaning it contains no data and has never undergone pre-training.

It is designed to be a maximum safe and robust base for starting training from scratch for specialized, smaller models, such as:

SPAM Detection Systems

FRAUD Detection Models

Background Check (BG Check) Models

A product of CMS Manhattan.

Files explanations ************* model 12 heads attation ***************** VOCAB_SIZE = 50257 MODEL_DIM = 768 NUM_HEADS = 12 NUM_LAYERS = 6 MAX_SEQ_LEN = 8192 FFN_HIDDEN_DIM = 4 * MODEL_DIM HEAD_DIM = MODEL_DIM // NUM_HEADS

JiRack_H12_L6_V50257_D768_MSL8192_FF768x4.pt

************* model 6 heads attation *****************

VOCAB_SIZE = 50257 MODEL_DIM = 768 NUM_HEADS = 6 NUM_LAYERS = 6 MAX_SEQ_LEN = 8192 FFN_HIDDEN_DIM = 4 * MODEL_DIM HEAD_DIM = MODEL_DIM // NUM_HEADS

JiRack_H6_L6_V50257_D768_MSL8192_FF768x4.pt

************* model 18 heads attation *****************