YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

GPT-OSS mini (RoPE + GQA + MoE + SwiGLU)

This is a mini version of the GPT-OSS model trained on a Thai chatbot conversation dataset.

Model Details

  • Architecture: GPT-OSS mini with RoPE, GQA, MoE, and SwiGLU
  • Vocab Size: 48000
  • Hidden Size: 512
  • Number of Layers: 6
  • Number of Attention Heads: 8
  • Number of Key-Value Heads: 2
  • Intermediate Size: 1536
  • Number of Experts (MoE): 8
  • Top-k (MoE): 2
  • Max Position Embeddings: 8192

Training Details

  • Dataset: ZombitX64/ThaiChatbotConversation (or fallback mini-dataset)
  • Training Epochs: 3
  • Learning Rate: 3e-4
  • Optimizer: AdamW
  • Gradient Accumulation Steps: 8
  • Batch Size: 2 per device
  • Gradient Checkpointing: Enabled
  • Mixed Precision: FP32

Usage

Downloads last month
14
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support