ZombitX64
/

gpt-oss-mini-sft

JonusNattapong commited on Aug 25, 2025

Commit

7fcfacb

verified ·

1 Parent(s): 90cdffd

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md ADDED Viewed

+# GPT-OSS mini (RoPE + GQA + MoE + SwiGLU)
+This is a mini version of the GPT-OSS model trained on a Thai chatbot conversation dataset.
+## Model Details
+- Architecture: GPT-OSS mini with RoPE, GQA, MoE, and SwiGLU
+- Vocab Size: 48000
+- Hidden Size: 512
+- Number of Layers: 6
+- Number of Attention Heads: 8
+- Number of Key-Value Heads: 2
+- Intermediate Size: 1536
+- Number of Experts (MoE): 8
+- Top-k (MoE): 2
+- Max Position Embeddings: 8192
+## Training Details
+- Dataset: ZombitX64/ThaiChatbotConversation (or fallback mini-dataset)
+- Training Epochs: 3
+- Learning Rate: 3e-4
+- Optimizer: AdamW
+- Gradient Accumulation Steps: 8
+- Batch Size: 2 per device
+- Gradient Checkpointing: Enabled
+- Mixed Precision: FP32
+## Usage