JOhyeongi's picture
Upload fine-tuned Cross-Encoder for veterinary RAG
150be83 verified
---
language:
- ko
license: mit
tags:
- sentence-transformers
- cross-encoder
- veterinary
- medical
- korean
base_model: madatnlp/km-bert
datasets:
- custom
metrics:
- accuracy
- f1
pipeline_tag: text-classification
---
# ๐Ÿฅ Vet KM-BERT Cross-Encoder
์ˆ˜์˜ํ•™ ๋„๋ฉ”์ธ์— ํŠนํ™”๋œ ํ•œ๊ตญ์–ด Cross-Encoder ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. RAG ์‹œ์Šคํ…œ์˜ Reranking ๋‹จ๊ณ„์—์„œ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
## ๋ชจ๋ธ ์ •๋ณด
- **Base Model**: [madatnlp/km-bert](https://huggingface.co/madatnlp/km-bert)
- **Task**: Binary Classification (์งˆ๋ฌธ-๋ฌธ์„œ ์—ฐ๊ด€์„ฑ ํŒ๋‹จ)
- **Language**: Korean (ํ•œ๊ตญ์–ด)
- **Domain**: Veterinary Medicine (์ˆ˜์˜ํ•™)
## ํ•™์Šต ๋ฐ์ดํ„ฐ
- **๋ฐ์ดํ„ฐ์…‹**: ์ˆ˜์˜ํ•™ ๋ฌธ์„œ 213๊ฐœ (5๊ฐœ ์ง„๋ฃŒ๊ณผ)
- ๋‚ด๊ณผ, ์•ˆ๊ณผ, ์™ธ๊ณผ, ์น˜๊ณผ, ํ”ผ๋ถ€๊ณผ
- **์งˆ๋ฌธ ์ˆ˜**: 600๊ฐœ (ํ•™์Šต 420๊ฐœ, ํ‰๊ฐ€ 180๊ฐœ)
- **ํ๋ ˆ์ด์…˜ ๋ฐฉ๋ฒ•**: LLM Scoring + Graph Refinement (LightGCN)
## ์„ฑ๋Šฅ
| Metric | Score |
|--------|-------|
| Accuracy | ~68% |
| F1-Score | ~0.72 |
| Precision | ~0.71 |
| Recall | ~0.73 |
## ์‚ฌ์šฉ ๋ฐฉ๋ฒ•
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# ๋ชจ๋ธ ๋กœ๋“œ
model_name = "JOhyeongi/vet-kmbert-cross-encoder"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# ์ถ”๋ก 
query = "๊ฐ•์•„์ง€๊ฐ€ ๊ตฌํ† ๋ฅผ ํ•ด์š”."
document = "๊ฐ•์•„์ง€ ๊ตฌํ† ์˜ ์›์ธ์€ ๋‹ค์–‘ํ•ฉ๋‹ˆ๋‹ค..."
inputs = tokenizer(
[[query, document]],
padding=True,
truncation=True,
return_tensors="pt",
max_length=512
)
with torch.no_grad():
logits = model(**inputs).logits
probs = torch.softmax(logits, dim=1)
score = probs[0][1].item() # Relevance score
print(f"Relevance Score: {score:.4f}")
```
## ์ „์ฒด RAG ํŒŒ์ดํ”„๋ผ์ธ
์ด ๋ชจ๋ธ์€ ๋‹ค์Œ ํ”„๋กœ์ ํŠธ์˜ ์ผ๋ถ€์ž…๋‹ˆ๋‹ค:
- **Repository**: [catholic_retreival](https://github.com/jasonhk24/catholic_retreival)
- **Pipeline**: Rationale Generation โ†’ Retrieval โ†’ **Reranking** โ†’ Answer Generation
## ํ•™์Šต ์„ค์ •
```yaml
Epochs: 3
Batch Size: 8
Learning Rate: 2e-5
Max Length: 512
Optimizer: AdamW
Weight Decay: 0.01
Warmup Steps: 500
```
## ๋ผ์ด์„ ์Šค
MIT License
## ์ธ์šฉ
```bibtex
@misc{vet-kmbert-cross-encoder,
title={Vet KM-BERT Cross-Encoder: Korean Veterinary RAG System},
author={Catholic University},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/JOhyeongi/vet-kmbert-cross-encoder}
}
```
## ์—ฐ๋ฝ์ฒ˜
- **GitHub**: [jasonhk24/catholic_retreival](https://github.com/jasonhk24/catholic_retreival)
- **Issues**: [GitHub Issues](https://github.com/jasonhk24/catholic_retreival/issues)