AETHER-Micro-0.5B / README.md
Be2Jay's picture
Upload AETHER-Micro 0.5B Phase 1 checkpoint (Step 57000)
de40e7d verified
metadata
license: apache-2.0
language:
  - ko
  - en
tags:
  - moe
  - mixture-of-experts
  - custom
  - aether
  - latent-thought
  - multi-token-prediction
library_name: transformers
pipeline_tag: text-generation

AETHER-Micro 0.5B (Phase 1 Checkpoint)

AETHER-Micro is an experimental MoE-based language model.

Model Details

Item Value
Architecture MoE big.LITTLE + LTL + MTP
Total Parameters 2.08B
Active Parameters ~0.5B per token
Hidden Size 1024
Layers 24
Attention GQA 16 heads, 4 KV heads
Experts 5 Big + 15 Small + 2 Shared
Vocab Size 64,000 Korean + English + Code
Context Length 8,192 RoPE
Training Step 57,000 / 100,000
Training Loss ~3.54

Architecture Features

  • big.LITTLE MoE: 5 large experts (2048 intermediate) + 15 small experts (1024 intermediate) + 2 shared experts (always active)
  • Latent Thought Layer (LTL): K-step latent reasoning (K=0,1,2) via Gumbel-Softmax selection
  • Multi-Token Prediction (MTP): 4-step ahead prediction replacing standard NTP loss
  • Wu-Xing Router: Five-element inspired expert routing
  • Quality Head: 4-dimensional quality assessment

Training

  • Phase: 1 of 3 (57% complete)
  • Data: 13.1B tokens (Korean 22%, English 25%, Code 21%, Math 24%, Dialogue 8%)
  • Optimizer: AdamW (lr=1e-4, cosine decay)
  • Precision: FP32

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "Be2Jay/AETHER-Micro-0.5B",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("Be2Jay/AETHER-Micro-0.5B")

Note: This is a Phase 1 training checkpoint. The model is still in early training and not yet suitable for production use.

License

Apache 2.0