How to use from the
Use from the
Model2Vec library
from model2vec import StaticModel

model = StaticModel.from_pretrained("kekeappa/kor-static-embedding-512")

kor-static-embedding-512

ํ•œ๊ตญ์–ด ํŠนํ™” ์ดˆ๊ฒฝ๋Ÿ‰ Static Embedding ๋ชจ๋ธ โ€” 68MB, 512์ฐจ์›.

kekeappa/kor-static-embedding-512๋ฅผ Matryoshka ํ•™์Šต์œผ๋กœ ๋งŒ๋“ค๊ณ  512์ฐจ์›์œผ๋กœ ์ž˜๋ผ๋‚ธ ๋ณ€์ข…์ž…๋‹ˆ๋‹ค. ๊ฐ™์€ ๋ชจ๋ธ ํŒจ๋ฐ€๋ฆฌ์— 4๊ฐœ ์ฐจ์› ์กด์žฌ โ€” ์šฉ๋„์— ๋งž๊ฒŒ ์„ ํƒ:

์ฐจ์› ํฌ๊ธฐ ์šฉ๋„
64 9MB ๐ŸŒ ๋ธŒ๋ผ์šฐ์ € ยท ๋ชจ๋ฐ”์ผ ยท ์—ฃ์ง€
128 17MB โšก ๊ฐ€๋ฒผ์šด ๊ฒ€์ƒ‰ยท๋ถ„๋ฅ˜
256 34MB โš–๏ธ ๊ฐ€์„ฑ๋น„
512 68MB ๐ŸŽฏ ์ตœ๊ณ  ์ •ํ™•๋„

์„ฑ๋Šฅ (KorSTS / KLUE-STS)

๋ฒค์น˜๋งˆํฌ Pearson Spearman
KorSTS-test 0.7760 0.7718
KorSTS-valid โ€” 0.8330
KLUE-STS-val โ€” 0.7033

์‚ฌ์šฉ

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("kekeappa/kor-static-embedding-512")
emb = model.encode(["ํ•œ๊ตญ์–ด ๋ฌธ์žฅ", "์ž„๋ฒ ๋”ฉ ํ…Œ์ŠคํŠธ"], normalize_embeddings=True)
print(emb.shape)  # (2, 512)

ํŠน์ง•

  • ์•„ํ‚คํ…์ฒ˜: StaticEmbedding (model2vec ๊ณ„์—ด) โ€” ํŠธ๋žœ์Šคํฌ๋จธ attention ์—†์Œ
  • ์ถ”๋ก : CPU ์ตœ์ , GPU ๋ถˆํ•„์š”
  • ์†๋„: ๋‹จ์ผ ์ฟผ๋ฆฌ < 1ms (๋ธŒ๋ผ์šฐ์ €์—์„œ๋„ ๋น ๋ฆ„)
  • ํ•œ์˜ ํ˜ธํ™˜: cross-lingual ํ•™์Šต๋จ โ€” ํ•œ๊ตญ์–ด ์ฟผ๋ฆฌ๋กœ ์˜์–ด ๋ฌธ์„œ ๊ฒ€์ƒ‰ ๊ฐ€๋Šฅ

ํ•™์Šต ๋ฐฉ๋ฒ•

4-stage ํ•™์Šต:

  1. Distillation ์ดˆ๊ธฐํ™”: BM-K/KoSimCSE-roberta-multitask teacher์˜ vocab ์ž„๋ฒ ๋”ฉ โ†’ PCA + Zipf weighting
  2. KorNLI MNRL: kakaobrain/kor_nli (multi_nli + snli) 277K triplet
  3. Cross-lingual MNRL: OPUS-100 ko-en parallel 200K pair
  4. Matryoshka regression: KorSTS + KLUE-STS + NLLB๋กœ ๋ฒˆ์—ญํ•œ ์˜์–ด STS-B
    • 64/128/256/512 ์ฐจ์› ๋™์‹œ ์ตœ์ ํ™” (MatryoshkaLoss)

ํ•™์Šต ์ฝ”๋“œ: https://github.com/johunsang/kor-static-embedding-512

๋ผ์ด์„ ์Šค

Apache 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for kekeappa/kor-static-embedding-512

Finetuned
(465)
this model

Datasets used to train kekeappa/kor-static-embedding-512