JonusNattapong commited on
Commit
7fcfacb
·
verified ·
1 Parent(s): 90cdffd

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +28 -0
README.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # GPT-OSS mini (RoPE + GQA + MoE + SwiGLU)
3
+
4
+ This is a mini version of the GPT-OSS model trained on a Thai chatbot conversation dataset.
5
+
6
+ ## Model Details
7
+ - Architecture: GPT-OSS mini with RoPE, GQA, MoE, and SwiGLU
8
+ - Vocab Size: 48000
9
+ - Hidden Size: 512
10
+ - Number of Layers: 6
11
+ - Number of Attention Heads: 8
12
+ - Number of Key-Value Heads: 2
13
+ - Intermediate Size: 1536
14
+ - Number of Experts (MoE): 8
15
+ - Top-k (MoE): 2
16
+ - Max Position Embeddings: 8192
17
+
18
+ ## Training Details
19
+ - Dataset: ZombitX64/ThaiChatbotConversation (or fallback mini-dataset)
20
+ - Training Epochs: 3
21
+ - Learning Rate: 3e-4
22
+ - Optimizer: AdamW
23
+ - Gradient Accumulation Steps: 8
24
+ - Batch Size: 2 per device
25
+ - Gradient Checkpointing: Enabled
26
+ - Mixed Precision: FP32
27
+
28
+ ## Usage