๐Ÿฅ Vet KM-BERT Cross-Encoder

์ˆ˜์˜ํ•™ ๋„๋ฉ”์ธ์— ํŠนํ™”๋œ ํ•œ๊ตญ์–ด Cross-Encoder ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. RAG ์‹œ์Šคํ…œ์˜ Reranking ๋‹จ๊ณ„์—์„œ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

๋ชจ๋ธ ์ •๋ณด

  • Base Model: madatnlp/km-bert
  • Task: Binary Classification (์งˆ๋ฌธ-๋ฌธ์„œ ์—ฐ๊ด€์„ฑ ํŒ๋‹จ)
  • Language: Korean (ํ•œ๊ตญ์–ด)
  • Domain: Veterinary Medicine (์ˆ˜์˜ํ•™)

ํ•™์Šต ๋ฐ์ดํ„ฐ

  • ๋ฐ์ดํ„ฐ์…‹: ์ˆ˜์˜ํ•™ ๋ฌธ์„œ 213๊ฐœ (5๊ฐœ ์ง„๋ฃŒ๊ณผ)
    • ๋‚ด๊ณผ, ์•ˆ๊ณผ, ์™ธ๊ณผ, ์น˜๊ณผ, ํ”ผ๋ถ€๊ณผ
  • ์งˆ๋ฌธ ์ˆ˜: 600๊ฐœ (ํ•™์Šต 420๊ฐœ, ํ‰๊ฐ€ 180๊ฐœ)
  • ํ๋ ˆ์ด์…˜ ๋ฐฉ๋ฒ•: LLM Scoring + Graph Refinement (LightGCN)

์„ฑ๋Šฅ

Metric Score
Accuracy ~68%
F1-Score ~0.72
Precision ~0.71
Recall ~0.73

์‚ฌ์šฉ ๋ฐฉ๋ฒ•

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# ๋ชจ๋ธ ๋กœ๋“œ
model_name = "JOhyeongi/vet-kmbert-cross-encoder"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# ์ถ”๋ก 
query = "๊ฐ•์•„์ง€๊ฐ€ ๊ตฌํ† ๋ฅผ ํ•ด์š”."
document = "๊ฐ•์•„์ง€ ๊ตฌํ† ์˜ ์›์ธ์€ ๋‹ค์–‘ํ•ฉ๋‹ˆ๋‹ค..."

inputs = tokenizer(
    [[query, document]], 
    padding=True, 
    truncation=True, 
    return_tensors="pt",
    max_length=512
)

with torch.no_grad():
    logits = model(**inputs).logits
    probs = torch.softmax(logits, dim=1)
    score = probs[0][1].item()  # Relevance score

print(f"Relevance Score: {score:.4f}")

์ „์ฒด RAG ํŒŒ์ดํ”„๋ผ์ธ

์ด ๋ชจ๋ธ์€ ๋‹ค์Œ ํ”„๋กœ์ ํŠธ์˜ ์ผ๋ถ€์ž…๋‹ˆ๋‹ค:

  • Repository: catholic_retreival
  • Pipeline: Rationale Generation โ†’ Retrieval โ†’ Reranking โ†’ Answer Generation

ํ•™์Šต ์„ค์ •

Epochs: 3
Batch Size: 8
Learning Rate: 2e-5
Max Length: 512
Optimizer: AdamW
Weight Decay: 0.01
Warmup Steps: 500

๋ผ์ด์„ ์Šค

MIT License

์ธ์šฉ

@misc{vet-kmbert-cross-encoder,
  title={Vet KM-BERT Cross-Encoder: Korean Veterinary RAG System},
  author={Catholic University},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/JOhyeongi/vet-kmbert-cross-encoder}
}

์—ฐ๋ฝ์ฒ˜

Downloads last month
11
Safetensors
Model size
98.7M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for JOhyeongi/vet-kmbert-cross-encoder

Base model

madatnlp/km-bert
Finetuned
(2)
this model