Instructions to use FINAL-Bench/Aether-14B-5Attn-prev with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use FINAL-Bench/Aether-14B-5Attn-prev with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="FINAL-Bench/Aether-14B-5Attn-prev", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("FINAL-Bench/Aether-14B-5Attn-prev", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use FINAL-Bench/Aether-14B-5Attn-prev with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "FINAL-Bench/Aether-14B-5Attn-prev"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FINAL-Bench/Aether-14B-5Attn-prev",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/FINAL-Bench/Aether-14B-5Attn-prev

SGLang

How to use FINAL-Bench/Aether-14B-5Attn-prev with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "FINAL-Bench/Aether-14B-5Attn-prev" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FINAL-Bench/Aether-14B-5Attn-prev",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "FINAL-Bench/Aether-14B-5Attn-prev" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FINAL-Bench/Aether-14B-5Attn-prev",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use FINAL-Bench/Aether-14B-5Attn-prev with Docker Model Runner:
```
docker model run hf.co/FINAL-Bench/Aether-14B-5Attn-prev
```

AETHER-Pilot-14B-5Attn (prev)

본 모델은 '정부 첨단 GPU 지원 사업'으로 만든 과제 산출물이며, 현재 고도화가 진행 중입니다. This model is a deliverable produced under the Government Advanced GPU Support Project, and is currently undergoing advancement (고도화).

-prev 접미사는 고도화 이전(preview) 체크포인트임을 의미합니다. 영어·수학 추가 학습(SFT v4)이 별도로 진행 중입니다.

개요 (Overview)

AETHER는 제로부터(from-scratch) 설계된 하이브리드 아키텍처 기반 국산 파운데이션 모델입니다. 단일 어텐션이 아니라 5종의 서로 다른 어텐션을 5×5 라틴 스퀘어(Latin Square)로 배치한 것이 핵심입니다.

항목	값
Total params	~14.7B
Active params	~3–4B (MoE top-5)
Layers	25 (5×5 Latin Square)
Hidden size	4096
Intermediate	12288
Attention heads	32 (GQA, KV 8)
Experts	25 (top-5) + 1 shared
Vocab	151,936 (Qwen tokenizer)
Context	4096
dtype	bfloat16

핵심 아키텍처

5종 하이브리드 어텐션 (5×5 Latin Square) — MLA / Full / Slide / GDN / Mamba2 를 행·열·대각선에 각 1회씩 배치
이중 해밀턴 사이클 (Oheng/오행) — 생(生)·극(克) 게이팅
스펙트럴 어텐션 — 주파수 선택자(F=12)
MoE — 5원소 × 5전문 = 25 expert (top-5) + 태극(공유) 1
메타인지 헤드 — 5종 추론 유형 식별

현재 상태 (Status)

Research preview. 한국어 instruction-following 및 AI 자기인식(self-recognition)이 SFT로 확립된 체크포인트입니다.
고도화 진행 중: 영어·수학 능력 보강(SFT v4) 및 선호도 최적화가 진행 중입니다.
본 모델은 연구 목적의 중간 산출물이며, 사실성·수리 연산 등에서 한계가 있을 수 있습니다.

사용법 (Usage)

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

repo = "FINAL-Bench/Aether-14B-5Attn-prev"
tok = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    repo, trust_remote_code=True, torch_dtype=torch.bfloat16, device_map="auto"
)

msgs = [{"role": "user", "content": "안녕하세요, 자기소개 해주세요."}]
ids = tok.apply_chat_template(msgs, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(ids, max_new_tokens=256, do_sample=True, temperature=0.7, top_p=0.9, repetition_penalty=1.15)
print(tok.decode(out[0][ids.shape[1]:], skip_special_tokens=True))

요구사항 (Requirements)

pip install torch transformers
pip install flash-linear-attention   # GatedDeltaNet / Mamba2 (필수)

⚠️ 본 모델은 custom architecture이므로 trust_remote_code=True 가 필요하며, GDN·Mamba2 레이어는 flash-linear-attention(fla) 패키지에 의존합니다.

제공/크레딧 (Provenance)

과제 성격: 정부 첨단 GPU 지원 사업 과제 산출물
아키텍처: AETHER 5×5 Latin Square Hybrid-Attention MoE
토크나이저: Qwen tokenizer 호환 (vocab 151,936)

고도화가 진행되면서 체크포인트와 모델카드가 갱신될 수 있습니다.

Downloads last month: -

Safetensors

Model size

15B params

Tensor type

BF16