|
|
---
|
|
|
language:
|
|
|
- ko
|
|
|
license: mit
|
|
|
tags:
|
|
|
- sentence-transformers
|
|
|
- cross-encoder
|
|
|
- veterinary
|
|
|
- medical
|
|
|
- korean
|
|
|
base_model: madatnlp/km-bert
|
|
|
datasets:
|
|
|
- custom
|
|
|
metrics:
|
|
|
- accuracy
|
|
|
- f1
|
|
|
pipeline_tag: text-classification
|
|
|
---
|
|
|
|
|
|
# ๐ฅ Vet KM-BERT Cross-Encoder
|
|
|
|
|
|
์์ํ ๋๋ฉ์ธ์ ํนํ๋ ํ๊ตญ์ด Cross-Encoder ๋ชจ๋ธ์
๋๋ค. RAG ์์คํ
์ Reranking ๋จ๊ณ์์ ์ฌ์ฉ๋ฉ๋๋ค.
|
|
|
|
|
|
## ๋ชจ๋ธ ์ ๋ณด
|
|
|
|
|
|
- **Base Model**: [madatnlp/km-bert](https://huggingface.co/madatnlp/km-bert)
|
|
|
- **Task**: Binary Classification (์ง๋ฌธ-๋ฌธ์ ์ฐ๊ด์ฑ ํ๋จ)
|
|
|
- **Language**: Korean (ํ๊ตญ์ด)
|
|
|
- **Domain**: Veterinary Medicine (์์ํ)
|
|
|
|
|
|
## ํ์ต ๋ฐ์ดํฐ
|
|
|
|
|
|
- **๋ฐ์ดํฐ์
**: ์์ํ ๋ฌธ์ 213๊ฐ (5๊ฐ ์ง๋ฃ๊ณผ)
|
|
|
- ๋ด๊ณผ, ์๊ณผ, ์ธ๊ณผ, ์น๊ณผ, ํผ๋ถ๊ณผ
|
|
|
- **์ง๋ฌธ ์**: 600๊ฐ (ํ์ต 420๊ฐ, ํ๊ฐ 180๊ฐ)
|
|
|
- **ํ๋ ์ด์
๋ฐฉ๋ฒ**: LLM Scoring + Graph Refinement (LightGCN)
|
|
|
|
|
|
## ์ฑ๋ฅ
|
|
|
|
|
|
| Metric | Score |
|
|
|
|--------|-------|
|
|
|
| Accuracy | ~68% |
|
|
|
| F1-Score | ~0.72 |
|
|
|
| Precision | ~0.71 |
|
|
|
| Recall | ~0.73 |
|
|
|
|
|
|
## ์ฌ์ฉ ๋ฐฉ๋ฒ
|
|
|
|
|
|
```python
|
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
|
|
import torch
|
|
|
|
|
|
# ๋ชจ๋ธ ๋ก๋
|
|
|
model_name = "JOhyeongi/vet-kmbert-cross-encoder"
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
|
|
model = AutoModelForSequenceClassification.from_pretrained(model_name)
|
|
|
|
|
|
# ์ถ๋ก
|
|
|
query = "๊ฐ์์ง๊ฐ ๊ตฌํ ๋ฅผ ํด์."
|
|
|
document = "๊ฐ์์ง ๊ตฌํ ์ ์์ธ์ ๋ค์ํฉ๋๋ค..."
|
|
|
|
|
|
inputs = tokenizer(
|
|
|
[[query, document]],
|
|
|
padding=True,
|
|
|
truncation=True,
|
|
|
return_tensors="pt",
|
|
|
max_length=512
|
|
|
)
|
|
|
|
|
|
with torch.no_grad():
|
|
|
logits = model(**inputs).logits
|
|
|
probs = torch.softmax(logits, dim=1)
|
|
|
score = probs[0][1].item() # Relevance score
|
|
|
|
|
|
print(f"Relevance Score: {score:.4f}")
|
|
|
```
|
|
|
|
|
|
## ์ ์ฒด RAG ํ์ดํ๋ผ์ธ
|
|
|
|
|
|
์ด ๋ชจ๋ธ์ ๋ค์ ํ๋ก์ ํธ์ ์ผ๋ถ์
๋๋ค:
|
|
|
- **Repository**: [catholic_retreival](https://github.com/jasonhk24/catholic_retreival)
|
|
|
- **Pipeline**: Rationale Generation โ Retrieval โ **Reranking** โ Answer Generation
|
|
|
|
|
|
## ํ์ต ์ค์
|
|
|
|
|
|
```yaml
|
|
|
Epochs: 3
|
|
|
Batch Size: 8
|
|
|
Learning Rate: 2e-5
|
|
|
Max Length: 512
|
|
|
Optimizer: AdamW
|
|
|
Weight Decay: 0.01
|
|
|
Warmup Steps: 500
|
|
|
```
|
|
|
|
|
|
## ๋ผ์ด์ ์ค
|
|
|
|
|
|
MIT License
|
|
|
|
|
|
## ์ธ์ฉ
|
|
|
|
|
|
```bibtex
|
|
|
@misc{vet-kmbert-cross-encoder,
|
|
|
title={Vet KM-BERT Cross-Encoder: Korean Veterinary RAG System},
|
|
|
author={Catholic University},
|
|
|
year={2025},
|
|
|
publisher={Hugging Face},
|
|
|
url={https://huggingface.co/JOhyeongi/vet-kmbert-cross-encoder}
|
|
|
}
|
|
|
```
|
|
|
|
|
|
## ์ฐ๋ฝ์ฒ
|
|
|
|
|
|
- **GitHub**: [jasonhk24/catholic_retreival](https://github.com/jasonhk24/catholic_retreival)
|
|
|
- **Issues**: [GitHub Issues](https://github.com/jasonhk24/catholic_retreival/issues)
|
|
|
|