kgrabko commited on
Commit
cd6e897
Β·
verified Β·
1 Parent(s): afaa3c6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -26
README.md CHANGED
@@ -1,49 +1,81 @@
1
  ---
2
  license: apache-2.0
3
  ---
4
- This file is intended strictly for saving the initial weights (checkpoint) of the JiRack GPT model. The model is "clean," meaning it contains no data and has never undergone pre-training.
5
 
6
- It is designed to be a maximum safe and robust base for starting training from scratch for specialized, smaller models, such as:
7
 
8
- SPAM Detection Systems
 
9
 
10
- FRAUD Detection Models
11
 
12
- Background Check (BG Check) Models
 
 
13
 
14
- A product of CMS Manhattan.
15
 
16
- So please GPT-2 huggingface tokenizer for english and for multi languages bert tokenizer from huggingface library
17
 
18
- Files explanations
19
 
20
- ************* model 12 heads attation *****************
 
21
 
22
- VOCAB_SIZE = 50257
23
- MODEL_DIM = 768
24
- NUM_HEADS = 12
25
- NUM_LAYERS = 6
26
- MAX_SEQ_LEN = 8192
27
- FFN_HIDDEN_DIM = 4 * MODEL_DIM
28
- HEAD_DIM = MODEL_DIM // NUM_HEADS
29
 
30
- JiRack_H12_L6_V50257_D768_MSL8192_FF768x4.pt
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
- ************* model 6 heads attation *****************
33
 
34
- VOCAB_SIZE = 50257
35
- MODEL_DIM = 768
36
- NUM_HEADS = 6
37
- NUM_LAYERS = 6
38
- MAX_SEQ_LEN = 8192
39
- FFN_HIDDEN_DIM = 4 * MODEL_DIM
40
- HEAD_DIM = MODEL_DIM // NUM_HEADS
41
 
42
- JiRack_H6_L6_V50257_D768_MSL8192_FF768x4.pt
43
 
 
 
 
 
 
 
 
 
44
 
 
 
45
 
 
46
 
 
47
 
 
 
 
 
 
 
 
 
48
 
 
 
49
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
 
4
 
5
+ # JiRack GPT Initial Weights
6
 
7
+ This file is strictly intended for saving the **initial weights (checkpoint)** of the JiRack GPT model.
8
+ The model is **"clean"**: it contains no data and has never undergone any pre-training.
9
 
10
+ It is engineered to be a maximally safe and robust base for **training from scratch** for specialized, smaller models, such as:
11
 
12
+ - **SPAM Detection Systems**
13
+ - **FRAUD Detection Models**
14
+ - **Background Check (BG Check) Models**
15
 
16
+ _A product of CMS Manhattan._
17
 
18
+ ---
19
 
20
+ ## Tokenizer Choices
21
 
22
+ - For English: **GPT-2 Hugging Face tokenizer**
23
+ - For multilingual use: **BERT tokenizer** from the Hugging Face library
24
 
25
+ ---
 
 
 
 
 
 
26
 
27
+ ## Model Architecture Details
28
+
29
+ ### GPT-2 Architecture (Classic, Transformer-like)
30
+
31
+ ```
32
+ CustomEmbedding
33
+ FrozenSignatureLayer
34
+ LearnedPositionalEmbedding
35
+ [TransformerBlock]
36
+ β”œβ”€β”€ MultiHeadAttention
37
+ β”œβ”€β”€ LayerNorm
38
+ β”œβ”€β”€ LayerNorm
39
+ β”œβ”€β”€ FFN
40
+ β”œβ”€β”€ Linear
41
+ β”œβ”€β”€ Activation: GELU
42
+ └── Linear
43
+ LayerNorm
44
+ Linear
45
+ ```
46
 
47
+ ---
48
 
49
+ ## Model Checkpoint File Explanations
 
 
 
 
 
 
50
 
51
+ ### **12-head Attention Model**
52
 
53
+ **Parameters:**
54
+ - `VOCAB_SIZE = 50257`
55
+ - `MODEL_DIM = 768`
56
+ - `NUM_HEADS = 12`
57
+ - `NUM_LAYERS = 6`
58
+ - `MAX_SEQ_LEN = 8192`
59
+ - `FFN_HIDDEN_DIM = 4 * MODEL_DIM`
60
+ - `HEAD_DIM = MODEL_DIM // NUM_HEADS`
61
 
62
+ **File:**
63
+ `JiRack_H12_L6_V50257_D768_MSL8192_FF768x4.pt`
64
 
65
+ ---
66
 
67
+ ### **6-head Attention Model**
68
 
69
+ **Parameters:**
70
+ - `VOCAB_SIZE = 50257`
71
+ - `MODEL_DIM = 768`
72
+ - `NUM_HEADS = 6`
73
+ - `NUM_LAYERS = 6`
74
+ - `MAX_SEQ_LEN = 8192`
75
+ - `FFN_HIDDEN_DIM = 4 * MODEL_DIM`
76
+ - `HEAD_DIM = MODEL_DIM // NUM_HEADS`
77
 
78
+ **File:**
79
+ `JiRack_H6_L6_V50257_D768_MSL8192_FF768x4.pt`
80
 
81
+ ---