File size: 3,382 Bytes
0a80ffa
5f2979e
0a80ffa
 
62faa56
5b024f4
0a80ffa
cd6e897
 
5b024f4
0a80ffa
cd6e897
0a80ffa
cd6e897
 
 
0a80ffa
cd6e897
9645ef0
cd6e897
afaa3c6
cd6e897
efd345f
cd6e897
 
efd345f
cd6e897
9645ef0
cd6e897
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9645ef0
cd6e897
9645ef0
cd6e897
9645ef0
cd6e897
9645ef0
cd6e897
 
 
 
 
 
 
 
9645ef0
cd6e897
 
4fed7b3
cd6e897
9645ef0
cd6e897
9645ef0
cd6e897
 
 
 
 
 
 
 
9645ef0
cd6e897
 
9645ef0
eb06cf6
52d1f6b
 
 
 
 
eb06cf6
 
 
 
ee76efb
 
 
 
 
 
 
eb06cf6
94122ee
 
 
 
 
 
 
9e8d0c1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
df19c59
b76cbd3
df19c59
eb06cf6
df19c59
5f2979e
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
---
license: gpl-3.0
---


# JiRack GPT-2 Initial Weights

This file is strictly intended for saving the **initial weights (checkpoint)** of the JiRack GPT model.  
The model is **"clean"**: it contains no data and has never undergone any pre-training.
- Powered by CMS Manhattan’s cutting-edge Vision-BERT architecture.

It is engineered to be a maximally safe and robust base for **training from scratch** for specialized, smaller models, such as:

- **SPAM Detection Systems**
- **FRAUD Detection Models**
- **Background Check (BG Check) Models**

_A product of CMS Manhattan._

---

## Tokenizer Choices

- For English: **GPT-2 Hugging Face tokenizer**
- For multilingual use: **BERT tokenizer** from the Hugging Face library

---

## Model Architecture Details

### GPT-2 Architecture (Classic, Transformer-like)

```
CustomEmbedding
FrozenSignatureLayer
LearnedPositionalEmbedding
[TransformerBlock]
    β”œβ”€β”€ MultiHeadAttention
    β”œβ”€β”€ LayerNorm
    β”œβ”€β”€ LayerNorm
    β”œβ”€β”€ FFN
          β”œβ”€β”€ Linear
          β”œβ”€β”€ Activation: GELU
          └── Linear
LayerNorm
Linear
```

---

## Model Checkpoint File Explanations

### **12-head Attention Model**

**Parameters:**
- `VOCAB_SIZE = 50257`
- `MODEL_DIM = 768`
- `NUM_HEADS = 12`
- `NUM_LAYERS = 6`
- `MAX_SEQ_LEN = 8192`
- `FFN_HIDDEN_DIM = 4 * MODEL_DIM`
- `HEAD_DIM = MODEL_DIM // NUM_HEADS`

**File:**  
`JiRack_H12_L6_V50257_D768_MSL8192_FF768x4.pt`

---

### **6-head Attention Model**

**Parameters:**
- `VOCAB_SIZE = 50257`
- `MODEL_DIM = 768`
- `NUM_HEADS = 6`
- `NUM_LAYERS = 6`
- `MAX_SEQ_LEN = 8192`
- `FFN_HIDDEN_DIM = 4 * MODEL_DIM`
- `HEAD_DIM = MODEL_DIM // NUM_HEADS`

**File:**  
`JiRack_H6_L6_V50257_D768_MSL8192_FF768x4.pt`



- So About PyTorch script . You can use Pytorch script for AI classification task . 
- Do not Jit for Chatbot task . Use just state dict PyTorch for  GPT  (Chatbot) tasks


---

See other models with same patterns for read parameters 

# install tokenizer before run 
---
- mkdir -p tokenizer
- wget -O tokenizer/tokenizer.json https://huggingface.co/gpt2/resolve/main/tokenizer.json
- wget -O tokenizer/vocab.json https://huggingface.co/gpt2/resolve/main/vocab.json
- wget -O tokenizer/merges.txt https://huggingface.co/gpt2/resolve/main/merges.txt
- wget -O tokenizer/tokenizer_config.json https://huggingface.co/gpt2/resolve/main/tokenizer_config.json

---
### JiRack RAG System
- It is microservice architecture with API Gateway and Service Discovery 
- Framework Spring boot and Google embeddings model for JiRack RAG System with Chatbot and JiRach model deployment with docker scipt 
- video https://www.youtube.com/watch?v=vHClQu76kMc
- RAG System https://bitbucket.org/cmsmanhattan/rag/src/main/


# Copyright Office
 
- From:
- cop-rc@loc.gov
- To:
- konstantin.grabko@yahoo.com

- Mon, Dec 15 at 7:31 AM

- THIS IS AN AUTOMATED EMAIL. PLEASE DO NOT REPLY.

- Thank you for submitting your registration claim using the Electronic Copyright Office (ECO) System.

- The following files were successfully uploaded for service request 1-15058193231

- File Name :jirack_gpt2_class_pytorch.zip
- File Size :2993 KB
- Date/Time :12/15/2025 7:27:48 AM

- [THREAD ID: 1-6X1C895]

- United States Copyright Office

---
Welcome to ask to design your corp model over 33B or 70B or more parameters
## 

CMS Manhattan  
Copyright Β© 2002–2026