How to use from the
Use from the
Model2Vec library
from model2vec import StaticModel

model = StaticModel.from_pretrained("kekeappa/kor-static-embedding-128")

kor-static-embedding-128

ํ•œ๊ตญ์–ด ํŠนํ™” ์ดˆ๊ฒฝ๋Ÿ‰ Static Embedding ๋ชจ๋ธ โ€” 17MB, 128์ฐจ์›.

kekeappa/kor-static-embedding-512๋ฅผ Matryoshka ํ•™์Šต์œผ๋กœ ๋งŒ๋“ค๊ณ  128์ฐจ์›์œผ๋กœ ์ž˜๋ผ๋‚ธ ๋ณ€์ข…์ž…๋‹ˆ๋‹ค. ๊ฐ™์€ ๋ชจ๋ธ ํŒจ๋ฐ€๋ฆฌ์— 4๊ฐœ ์ฐจ์› ์กด์žฌ โ€” ์šฉ๋„์— ๋งž๊ฒŒ ์„ ํƒ:

์ฐจ์› ํฌ๊ธฐ ์šฉ๋„
64 9MB ๐ŸŒ ๋ธŒ๋ผ์šฐ์ € ยท ๋ชจ๋ฐ”์ผ ยท ์—ฃ์ง€
128 17MB โšก ๊ฐ€๋ฒผ์šด ๊ฒ€์ƒ‰ยท๋ถ„๋ฅ˜
256 34MB โš–๏ธ ๊ฐ€์„ฑ๋น„
512 68MB ๐ŸŽฏ ์ตœ๊ณ  ์ •ํ™•๋„

์„ฑ๋Šฅ (KorSTS / KLUE-STS)

๋ฒค์น˜๋งˆํฌ Pearson Spearman
KorSTS-test 0.7569 0.7521
KorSTS-valid โ€” 0.8082
KLUE-STS-val โ€” 0.6656

์‚ฌ์šฉ

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("kekeappa/kor-static-embedding-128")
emb = model.encode(["ํ•œ๊ตญ์–ด ๋ฌธ์žฅ", "์ž„๋ฒ ๋”ฉ ํ…Œ์ŠคํŠธ"], normalize_embeddings=True)
print(emb.shape)  # (2, 128)

ํŠน์ง•

  • ์•„ํ‚คํ…์ฒ˜: StaticEmbedding (model2vec ๊ณ„์—ด) โ€” ํŠธ๋žœ์Šคํฌ๋จธ attention ์—†์Œ
  • ์ถ”๋ก : CPU ์ตœ์ , GPU ๋ถˆํ•„์š”
  • ์†๋„: ๋‹จ์ผ ์ฟผ๋ฆฌ < 1ms (๋ธŒ๋ผ์šฐ์ €์—์„œ๋„ ๋น ๋ฆ„)
  • ํ•œ์˜ ํ˜ธํ™˜: cross-lingual ํ•™์Šต๋จ โ€” ํ•œ๊ตญ์–ด ์ฟผ๋ฆฌ๋กœ ์˜์–ด ๋ฌธ์„œ ๊ฒ€์ƒ‰ ๊ฐ€๋Šฅ

ํ•™์Šต ๋ฐฉ๋ฒ•

4-stage ํ•™์Šต:

  1. Distillation ์ดˆ๊ธฐํ™”: BM-K/KoSimCSE-roberta-multitask teacher์˜ vocab ์ž„๋ฒ ๋”ฉ โ†’ PCA + Zipf weighting
  2. KorNLI MNRL: kakaobrain/kor_nli (multi_nli + snli) 277K triplet
  3. Cross-lingual MNRL: OPUS-100 ko-en parallel 200K pair
  4. Matryoshka regression: KorSTS + KLUE-STS + NLLB๋กœ ๋ฒˆ์—ญํ•œ ์˜์–ด STS-B
    • 64/128/256/512 ์ฐจ์› ๋™์‹œ ์ตœ์ ํ™” (MatryoshkaLoss)

ํ•™์Šต ์ฝ”๋“œ: https://github.com/johunsang/kor-static-embedding-512

๋ผ์ด์„ ์Šค

Apache 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Safetensors
Model size
4.1M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for kekeappa/kor-static-embedding-128

Finetuned
(465)
this model

Datasets used to train kekeappa/kor-static-embedding-128