---
language:
- ko
license: mit
tags:
- sentence-transformers
- cross-encoder
- veterinary
- medical
- korean
base_model: madatnlp/km-bert
datasets:
- custom
metrics:
- accuracy
- f1
pipeline_tag: text-classification
---

# 🏥 Vet KM-BERT Cross-Encoder

수의학 도메인에 특화된 한국어 Cross-Encoder 모델입니다. RAG 시스템의 Reranking 단계에서 사용됩니다.

## 모델 정보

- **Base Model**: [madatnlp/km-bert](https://huggingface.co/madatnlp/km-bert)
- **Task**: Binary Classification (질문-문서 연관성 판단)
- **Language**: Korean (한국어)
- **Domain**: Veterinary Medicine (수의학)

## 학습 데이터

- **데이터셋**: 수의학 문서 213개 (5개 진료과)
  - 내과, 안과, 외과, 치과, 피부과
- **질문 수**: 600개 (학습 420개, 평가 180개)
- **큐레이션 방법**: LLM Scoring + Graph Refinement (LightGCN)

## 성능

| Metric | Score |
|--------|-------|
| Accuracy | ~68% |
| F1-Score | ~0.72 |
| Precision | ~0.71 |
| Recall | ~0.73 |

## 사용 방법

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# 모델 로드
model_name = "JOhyeongi/vet-kmbert-cross-encoder"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# 추론
query = "강아지가 구토를 해요."
document = "강아지 구토의 원인은 다양합니다..."

inputs = tokenizer(
    [[query, document]], 
    padding=True, 
    truncation=True, 
    return_tensors="pt",
    max_length=512
)

with torch.no_grad():
    logits = model(**inputs).logits
    probs = torch.softmax(logits, dim=1)
    score = probs[0][1].item()  # Relevance score

print(f"Relevance Score: {score:.4f}")
```

## 전체 RAG 파이프라인

이 모델은 다음 프로젝트의 일부입니다:
- **Repository**: [catholic_retreival](https://github.com/jasonhk24/catholic_retreival)
- **Pipeline**: Rationale Generation → Retrieval → **Reranking** → Answer Generation

## 학습 설정

```yaml
Epochs: 3
Batch Size: 8
Learning Rate: 2e-5
Max Length: 512
Optimizer: AdamW
Weight Decay: 0.01
Warmup Steps: 500
```

## 라이선스

MIT License

## 인용

```bibtex
@misc{vet-kmbert-cross-encoder,
  title={Vet KM-BERT Cross-Encoder: Korean Veterinary RAG System},
  author={Catholic University},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/JOhyeongi/vet-kmbert-cross-encoder}
}
```

## 연락처

- **GitHub**: [jasonhk24/catholic_retreival](https://github.com/jasonhk24/catholic_retreival)
- **Issues**: [GitHub Issues](https://github.com/jasonhk24/catholic_retreival/issues)