Upload fine-tuned Cross-Encoder for veterinary RAG

150be83 verified about 2 months ago

2.79 kB

	---
	language:
	- ko
	license: mit
	tags:
	- sentence-transformers
	- cross-encoder
	- veterinary
	- medical
	- korean
	base_model: madatnlp/km-bert
	datasets:
	- custom
	metrics:
	- accuracy
	- f1
	pipeline_tag: text-classification
	---

	# 🏥 Vet KM-BERT Cross-Encoder

	수의학 도메인에 특화된 한국어 Cross-Encoder 모델입니다. RAG 시스템의 Reranking 단계에서 사용됩니다.

	## 모델 정보

	- Base Model: [madatnlp/km-bert](https://huggingface.co/madatnlp/km-bert)
	- Task: Binary Classification (질문-문서 연관성 판단)
	- Language: Korean (한국어)
	- Domain: Veterinary Medicine (수의학)

	## 학습 데이터

	- 데이터셋: 수의학 문서 213개 (5개 진료과)
	- 내과, 안과, 외과, 치과, 피부과
	- 질문 수: 600개 (학습 420개, 평가 180개)
	- 큐레이션 방법: LLM Scoring + Graph Refinement (LightGCN)

	## 성능

	\| Metric \| Score \|
	\|--------\|-------\|
	\| Accuracy \| ~68% \|
	\| F1-Score \| ~0.72 \|
	\| Precision \| ~0.71 \|
	\| Recall \| ~0.73 \|

	## 사용 방법

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	# 모델 로드
	model_name = "JOhyeongi/vet-kmbert-cross-encoder"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSequenceClassification.from_pretrained(model_name)

	# 추론
	query = "강아지가 구토를 해요."
	document = "강아지 구토의 원인은 다양합니다..."

	inputs = tokenizer(
	[[query, document]],
	padding=True,
	truncation=True,
	return_tensors="pt",
	max_length=512
	)

	with torch.no_grad():
	logits = model(**inputs).logits
	probs = torch.softmax(logits, dim=1)
	score = probs[0][1].item() # Relevance score

	print(f"Relevance Score: {score:.4f}")
	```

	## 전체 RAG 파이프라인

	이 모델은 다음 프로젝트의 일부입니다:
	- Repository: [catholic_retreival](https://github.com/jasonhk24/catholic_retreival)
	- Pipeline: Rationale Generation → Retrieval → Reranking → Answer Generation

	## 학습 설정

	```yaml
	Epochs: 3
	Batch Size: 8
	Learning Rate: 2e-5
	Max Length: 512
	Optimizer: AdamW
	Weight Decay: 0.01
	Warmup Steps: 500
	```

	## 라이선스

	MIT License

	## 인용

	```bibtex
	@misc{vet-kmbert-cross-encoder,
	title={Vet KM-BERT Cross-Encoder: Korean Veterinary RAG System},
	author={Catholic University},
	year={2025},
	publisher={Hugging Face},
	url={https://huggingface.co/JOhyeongi/vet-kmbert-cross-encoder}
	}
	```

	## 연락처

	- GitHub: [jasonhk24/catholic_retreival](https://github.com/jasonhk24/catholic_retreival)
	- Issues: [GitHub Issues](https://github.com/jasonhk24/catholic_retreival/issues)