SOVYN-85M

์ฒ˜์Œ๋ถ€ํ„ฐ ํ•™์Šตํ•œ 85M ํŒŒ๋ผ๋ฏธํ„ฐ ํ•œ๊ตญ์–ด ์ถ”๋ก  ๋ชจ๋ธ.

์ˆ˜ํ•™, ์ฝ”๋“œ ํŠธ๋ ˆ์ด์‹ฑ, ๋…ผ๋ฆฌ, ๋ฌผ๋ฆฌ, ํ™”ํ•™, ์ƒ๋ฌผ, ์ง€๊ตฌ๊ณผํ•™, ํ•œ๊ตญ์‚ฌ, ๋ฏธ์ ๋ถ„ ๋“ฑ 119๊ฐœ ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋ฌธ์ œ๋ฅผ ๋‹จ๊ณ„๋ณ„๋กœ ํ’€์ดํ•œ๋‹ค.

์ŠคํŽ™

ํŒŒ๋ผ๋ฏธํ„ฐ 85.4M
์•„ํ‚คํ…์ฒ˜ GPT (Decoder-only)
๋ ˆ์ด์–ด 12
์–ดํ…์…˜ ํ—ค๋“œ 12
์ž„๋ฒ ๋”ฉ ์ฐจ์› 768
์ปจํ…์ŠคํŠธ ๊ธธ์ด 512
์–ดํœ˜ ํฌ๊ธฐ 16,384 (BPE)
์–ดํ…์…˜ Flash Attention (SDPA)
์ •๋ฐ€๋„ float16

ํ•™์Šต

  • ๋ฐ์ดํ„ฐ: 591,261๊ฐœ ํ•ฉ์„ฑ ์ถ”๋ก  ๋ฌธ์ œ (119 ์นดํ…Œ๊ณ ๋ฆฌ), 27.97M ํ† ํฐ
  • ์˜ตํ‹ฐ๋งˆ์ด์ €: AdamW (lr=3e-4, weight_decay=0.1)
  • ์Šค์ผ€์ค„: Cosine decay + warmup 500 steps
  • ๋ฐฐ์น˜: 16 x 4 grad_accum = effective 64
  • ์Šคํ…: 20,000
  • GPU: RTX 5080 16GB
  • ํ•™์Šต ์‹œ๊ฐ„: ~4์‹œ๊ฐ„

๋ฒค์น˜๋งˆํฌ

์ž์ฒด ๋ฒค์น˜๋งˆํฌ 52๋ฌธ์ œ, 10๊ฐœ ์นดํ…Œ๊ณ ๋ฆฌ.

์นดํ…Œ๊ณ ๋ฆฌ ์ •ํ™•๋„
์‚ฐ์ˆ  100%
์ฝ”๋“œ ํŠธ๋ ˆ์ด์‹ฑ 100%
์ˆซ์ž ์„ฑ์งˆ 100%
์„œ์ˆ ํ˜• 100%
์—ฐ์‚ฐ ์šฐ์„ ์ˆœ์œ„ 88%
๋ฆฌ์ŠคํŠธ ์—ฐ์‚ฐ 83%
๊ด„ํ˜ธ ์—ฐ์‚ฐ 80%
๋ฐฉ์ •์‹ 80%
๋…ผ๋ฆฌ 80%
์ˆ˜์—ด 33%
์ „์ฒด 86.5%

์‚ฌ์šฉ๋ฒ•

pip install torch safetensors tokenizers huggingface_hub
import torch
from safetensors.torch import load_file
from tokenizers import Tokenizer
from huggingface_hub import hf_hub_download

# ๋‹ค์šด๋กœ๋“œ
model_path = hf_hub_download("SOVYN/SOVYN-85M", "model.safetensors")
tok_path = hf_hub_download("SOVYN/SOVYN-85M", "tokenizer.json")
code_path = hf_hub_download("SOVYN/SOVYN-85M", "model.py")

# ์•„ํ‚คํ…์ฒ˜ ๋กœ๋“œ
import importlib.util
spec = importlib.util.spec_from_file_location("model", code_path)
mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod)

# ๋ชจ๋ธ ๋กœ๋“œ
model = mod.SOVYN85M()
state_dict = load_file(model_path)
state_dict = {k: v.float() for k, v in state_dict.items()}
model.load_state_dict(state_dict)
model.eval()

tokenizer = Tokenizer.from_file(tok_path)

# ์ถ”๋ก 
prompt = "๋ฌธ์ œ: 3x + 7 = 22์ผ ๋•Œ, x์˜ ๊ฐ’์„ ๊ตฌํ•˜์‹œ์˜ค.\nํ’€์ด:\n"
ids = torch.tensor([tokenizer.encode(prompt).ids])
out = model.generate(ids, max_new_tokens=200, temperature=0.3)
print(tokenizer.decode(out[0].tolist()))

ํ”„๋กฌํ”„ํŠธ ํ˜•์‹

๋ฌธ์ œ: {๋‚ด์šฉ}
ํ’€์ด:

"ํ’€์ด:" ์ดํ›„๋ฅผ ์ƒ์„ฑ. ๋‹จ๊ณ„๋ณ„ ํ’€์ด + "๋‹ต: {์ •๋‹ต}" ํ˜•ํƒœ๋กœ ์ถœ๋ ฅ.

์ œํ•œ์‚ฌํ•ญ

  • ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ๋กœ๋งŒ ํ•™์Šต. ์ž์œ  ๋Œ€ํ™” ๋ถˆ๊ฐ€.
  • ์ˆ˜์—ด(๋“ฑ๋น„/ํ”ผ๋ณด๋‚˜์น˜) ์•ฝํ•จ.
  • ์ปจํ…์ŠคํŠธ 512 ํ† ํฐ ์ œํ•œ.

๋ผ์ด์„ ์Šค

Apache-2.0

Downloads last month
300
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ 1 Ask for provider support

Evaluation results