AETHER-Pilot-14B-5Attn (prev)

λ³Έ λͺ¨λΈμ€ 'μ •λΆ€ 첨단 GPU 지원 사업'으둜 λ§Œλ“  과제 μ‚°μΆœλ¬Όμ΄λ©°, ν˜„μž¬ 고도화가 μ§„ν–‰ μ€‘μž…λ‹ˆλ‹€. This model is a deliverable produced under the Government Advanced GPU Support Project, and is currently undergoing advancement (고도화).

-prev μ ‘λ―Έμ‚¬λŠ” 고도화 이전(preview) μ²΄ν¬ν¬μΈνŠΈμž„μ„ μ˜λ―Έν•©λ‹ˆλ‹€. μ˜μ–΄Β·μˆ˜ν•™ μΆ”κ°€ ν•™μŠ΅(SFT v4)이 λ³„λ„λ‘œ μ§„ν–‰ μ€‘μž…λ‹ˆλ‹€.


κ°œμš” (Overview)

AETHERλŠ” μ œλ‘œλΆ€ν„°(from-scratch) μ„€κ³„λœ ν•˜μ΄λΈŒλ¦¬λ“œ μ•„ν‚€ν…μ²˜ 기반 κ΅­μ‚° νŒŒμš΄λ°μ΄μ…˜ λͺ¨λΈμž…λ‹ˆλ‹€. 단일 μ–΄ν…μ…˜μ΄ μ•„λ‹ˆλΌ 5μ’…μ˜ μ„œλ‘œ λ‹€λ₯Έ μ–΄ν…μ…˜μ„ 5Γ—5 라틴 μŠ€ν€˜μ–΄(Latin Square)둜 λ°°μΉ˜ν•œ 것이 ν•΅μ‹¬μž…λ‹ˆλ‹€.

ν•­λͺ© κ°’
Total params ~14.7B
Active params ~3–4B (MoE top-5)
Layers 25 (5Γ—5 Latin Square)
Hidden size 4096
Intermediate 12288
Attention heads 32 (GQA, KV 8)
Experts 25 (top-5) + 1 shared
Vocab 151,936 (Qwen tokenizer)
Context 4096
dtype bfloat16

핡심 μ•„ν‚€ν…μ²˜

  1. 5μ’… ν•˜μ΄λΈŒλ¦¬λ“œ μ–΄ν…μ…˜ (5Γ—5 Latin Square) β€” MLA / Full / Slide / GDN / Mamba2 λ₯Ό ν–‰Β·μ—΄Β·λŒ€κ°μ„ μ— 각 1νšŒμ”© 배치
  2. 이쀑 ν•΄λ°€ν„΄ 사이클 (Oheng/μ˜€ν–‰) β€” 생(η”Ÿ)Β·κ·Ή(ε…‹) κ²Œμ΄νŒ…
  3. μŠ€νŽ™νŠΈλŸ΄ μ–΄ν…μ…˜ β€” 주파수 μ„ νƒμž(F=12)
  4. MoE β€” 5μ›μ†Œ Γ— 5μ „λ¬Έ = 25 expert (top-5) + νƒœκ·Ή(곡유) 1
  5. 메타인지 ν—€λ“œ β€” 5μ’… μΆ”λ‘  μœ ν˜• 식별

ν˜„μž¬ μƒνƒœ (Status)

  • Research preview. ν•œκ΅­μ–΄ instruction-following 및 AI μžκΈ°μΈμ‹(self-recognition)이 SFT둜 ν™•λ¦½λœ μ²΄ν¬ν¬μΈνŠΈμž…λ‹ˆλ‹€.
  • 고도화 μ§„ν–‰ 쀑: μ˜μ–΄Β·μˆ˜ν•™ λŠ₯λ ₯ 보강(SFT v4) 및 μ„ ν˜Έλ„ μ΅œμ ν™”κ°€ μ§„ν–‰ μ€‘μž…λ‹ˆλ‹€.
  • λ³Έ λͺ¨λΈμ€ 연ꡬ λͺ©μ μ˜ 쀑간 μ‚°μΆœλ¬Όμ΄λ©°, μ‚¬μ‹€μ„±Β·μˆ˜λ¦¬ μ—°μ‚° λ“±μ—μ„œ ν•œκ³„κ°€ μžˆμ„ 수 μžˆμŠ΅λ‹ˆλ‹€.

μ‚¬μš©λ²• (Usage)

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

repo = "FINAL-Bench/Aether-14B-5Attn-prev"
tok = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    repo, trust_remote_code=True, torch_dtype=torch.bfloat16, device_map="auto"
)

msgs = [{"role": "user", "content": "μ•ˆλ…•ν•˜μ„Έμš”, μžκΈ°μ†Œκ°œ ν•΄μ£Όμ„Έμš”."}]
ids = tok.apply_chat_template(msgs, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(ids, max_new_tokens=256, do_sample=True, temperature=0.7, top_p=0.9, repetition_penalty=1.15)
print(tok.decode(out[0][ids.shape[1]:], skip_special_tokens=True))

μš”κ΅¬μ‚¬ν•­ (Requirements)

pip install torch transformers
pip install flash-linear-attention   # GatedDeltaNet / Mamba2 (ν•„μˆ˜)

⚠️ λ³Έ λͺ¨λΈμ€ custom architectureμ΄λ―€λ‘œ trust_remote_code=True κ°€ ν•„μš”ν•˜λ©°, GDNΒ·Mamba2 λ ˆμ΄μ–΄λŠ” flash-linear-attention(fla) νŒ¨ν‚€μ§€μ— μ˜μ‘΄ν•©λ‹ˆλ‹€.


제곡/ν¬λ ˆλ”§ (Provenance)

  • 과제 성격: μ •λΆ€ 첨단 GPU 지원 사업 과제 μ‚°μΆœλ¬Ό
  • μ•„ν‚€ν…μ²˜: AETHER 5Γ—5 Latin Square Hybrid-Attention MoE
  • ν† ν¬λ‚˜μ΄μ €: Qwen tokenizer ν˜Έν™˜ (vocab 151,936)

고도화가 μ§„ν–‰λ˜λ©΄μ„œ μ²΄ν¬ν¬μΈνŠΈμ™€ λͺ¨λΈμΉ΄λ“œκ°€ 갱신될 수 μžˆμŠ΅λ‹ˆλ‹€.

Downloads last month
-
Safetensors
Model size
15B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support