ZombitX64
/

gpt-oss-mini-sft

Model card Files Files and versions

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

GPT-OSS mini (RoPE + GQA + MoE + SwiGLU)

This is a mini version of the GPT-OSS model trained on a Thai chatbot conversation dataset.

Model Details

Architecture: GPT-OSS mini with RoPE, GQA, MoE, and SwiGLU
Vocab Size: 48000
Hidden Size: 512
Number of Layers: 6
Number of Attention Heads: 8
Number of Key-Value Heads: 2
Intermediate Size: 1536
Number of Experts (MoE): 8
Top-k (MoE): 2
Max Position Embeddings: 8192

Training Details

Dataset: ZombitX64/ThaiChatbotConversation (or fallback mini-dataset)
Training Epochs: 3
Learning Rate: 3e-4
Optimizer: AdamW
Gradient Accumulation Steps: 8
Batch Size: 2 per device
Gradient Checkpointing: Enabled
Mixed Precision: FP32

Usage

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support