CMSManhattan
/

JiRack_empty

Model card Files Files and versions

kgrabko commited on Nov 29, 2025

Commit

cd6e897

·

verified ·

1 Parent(s): afaa3c6

Update README.md

Files changed (1) hide show

README.md +58 -26

README.md CHANGED Viewed

@@ -1,49 +1,81 @@
 ---
 license: apache-2.0
 ---
-This file is intended strictly for saving the initial weights (checkpoint) of the JiRack GPT model. The model is "clean," meaning it contains no data and has never undergone pre-training.
-It is designed to be a maximum safe and robust base for starting training from scratch for specialized, smaller models, such as:
-SPAM Detection Systems
-FRAUD Detection Models
-Background Check (BG Check) Models
-A product of CMS Manhattan.
-So please GPT-2 huggingface tokenizer for english and for multi languages bert tokenizer from huggingface library
-Files explanations
-************* model 12 heads attation *****************
-VOCAB_SIZE = 50257
-MODEL_DIM = 768
-NUM_HEADS = 12
-NUM_LAYERS = 6
-MAX_SEQ_LEN = 8192
-FFN_HIDDEN_DIM = 4 * MODEL_DIM
-HEAD_DIM = MODEL_DIM // NUM_HEADS
-JiRack_H12_L6_V50257_D768_MSL8192_FF768x4.pt
-************* model 6 heads attation *****************
-VOCAB_SIZE = 50257
-MODEL_DIM = 768
-NUM_HEADS = 6
-NUM_LAYERS = 6
-MAX_SEQ_LEN = 8192
-FFN_HIDDEN_DIM = 4 * MODEL_DIM
-HEAD_DIM = MODEL_DIM // NUM_HEADS
-JiRack_H6_L6_V50257_D768_MSL8192_FF768x4.pt

 ---
 license: apache-2.0
 ---
+# JiRack GPT Initial Weights
+This file is strictly intended for saving the **initial weights (checkpoint)** of the JiRack GPT model.
+The model is **"clean"**: it contains no data and has never undergone any pre-training.
+It is engineered to be a maximally safe and robust base for **training from scratch** for specialized, smaller models, such as:
+- **SPAM Detection Systems**
+- **FRAUD Detection Models**
+- **Background Check (BG Check) Models**
+_A product of CMS Manhattan._
+---
+## Tokenizer Choices
+- For English: **GPT-2 Hugging Face tokenizer**
+- For multilingual use: **BERT tokenizer** from the Hugging Face library
+---
+## Model Architecture Details
+### GPT-2 Architecture (Classic, Transformer-like)
+```
+CustomEmbedding
+FrozenSignatureLayer
+LearnedPositionalEmbedding
+[TransformerBlock]
+    ├── MultiHeadAttention
+    ├── LayerNorm
+    ├── LayerNorm
+    ├── FFN
+          ├── Linear
+          ├── Activation: GELU
+          └── Linear
+LayerNorm
+Linear
+```
+---
+## Model Checkpoint File Explanations
+### **12-head Attention Model**
+**Parameters:**
+- `VOCAB_SIZE = 50257`
+- `MODEL_DIM = 768`
+- `NUM_HEADS = 12`
+- `NUM_LAYERS = 6`
+- `MAX_SEQ_LEN = 8192`
+- `FFN_HIDDEN_DIM = 4 * MODEL_DIM`
+- `HEAD_DIM = MODEL_DIM // NUM_HEADS`
+**File:**
+`JiRack_H12_L6_V50257_D768_MSL8192_FF768x4.pt`
+---
+### **6-head Attention Model**
+**Parameters:**
+- `VOCAB_SIZE = 50257`
+- `MODEL_DIM = 768`
+- `NUM_HEADS = 6`
+- `NUM_LAYERS = 6`
+- `MAX_SEQ_LEN = 8192`
+- `FFN_HIDDEN_DIM = 4 * MODEL_DIM`
+- `HEAD_DIM = MODEL_DIM // NUM_HEADS`
+**File:**
+`JiRack_H6_L6_V50257_D768_MSL8192_FF768x4.pt`
+---