--- language: - ko license: mit tags: - sentence-transformers - cross-encoder - veterinary - medical - korean base_model: madatnlp/km-bert datasets: - custom metrics: - accuracy - f1 pipeline_tag: text-classification --- # πŸ₯ Vet KM-BERT Cross-Encoder μˆ˜μ˜ν•™ 도메인에 νŠΉν™”λœ ν•œκ΅­μ–΄ Cross-Encoder λͺ¨λΈμž…λ‹ˆλ‹€. RAG μ‹œμŠ€ν…œμ˜ Reranking λ‹¨κ³„μ—μ„œ μ‚¬μš©λ©λ‹ˆλ‹€. ## λͺ¨λΈ 정보 - **Base Model**: [madatnlp/km-bert](https://huggingface.co/madatnlp/km-bert) - **Task**: Binary Classification (질문-λ¬Έμ„œ μ—°κ΄€μ„± νŒλ‹¨) - **Language**: Korean (ν•œκ΅­μ–΄) - **Domain**: Veterinary Medicine (μˆ˜μ˜ν•™) ## ν•™μŠ΅ 데이터 - **데이터셋**: μˆ˜μ˜ν•™ λ¬Έμ„œ 213개 (5개 μ§„λ£Œκ³Ό) - λ‚΄κ³Ό, μ•ˆκ³Ό, μ™Έκ³Ό, 치과, ν”ΌλΆ€κ³Ό - **질문 수**: 600개 (ν•™μŠ΅ 420개, 평가 180개) - **νλ ˆμ΄μ…˜ 방법**: LLM Scoring + Graph Refinement (LightGCN) ## μ„±λŠ₯ | Metric | Score | |--------|-------| | Accuracy | ~68% | | F1-Score | ~0.72 | | Precision | ~0.71 | | Recall | ~0.73 | ## μ‚¬μš© 방법 ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch # λͺ¨λΈ λ‘œλ“œ model_name = "JOhyeongi/vet-kmbert-cross-encoder" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) # μΆ”λ‘  query = "κ°•μ•„μ§€κ°€ ꡬ토λ₯Ό ν•΄μš”." document = "κ°•μ•„μ§€ κ΅¬ν† μ˜ 원인은 λ‹€μ–‘ν•©λ‹ˆλ‹€..." inputs = tokenizer( [[query, document]], padding=True, truncation=True, return_tensors="pt", max_length=512 ) with torch.no_grad(): logits = model(**inputs).logits probs = torch.softmax(logits, dim=1) score = probs[0][1].item() # Relevance score print(f"Relevance Score: {score:.4f}") ``` ## 전체 RAG νŒŒμ΄ν”„λΌμΈ 이 λͺ¨λΈμ€ λ‹€μŒ ν”„λ‘œμ νŠΈμ˜ μΌλΆ€μž…λ‹ˆλ‹€: - **Repository**: [catholic_retreival](https://github.com/jasonhk24/catholic_retreival) - **Pipeline**: Rationale Generation β†’ Retrieval β†’ **Reranking** β†’ Answer Generation ## ν•™μŠ΅ μ„€μ • ```yaml Epochs: 3 Batch Size: 8 Learning Rate: 2e-5 Max Length: 512 Optimizer: AdamW Weight Decay: 0.01 Warmup Steps: 500 ``` ## λΌμ΄μ„ μŠ€ MIT License ## 인용 ```bibtex @misc{vet-kmbert-cross-encoder, title={Vet KM-BERT Cross-Encoder: Korean Veterinary RAG System}, author={Catholic University}, year={2025}, publisher={Hugging Face}, url={https://huggingface.co/JOhyeongi/vet-kmbert-cross-encoder} } ``` ## μ—°λ½μ²˜ - **GitHub**: [jasonhk24/catholic_retreival](https://github.com/jasonhk24/catholic_retreival) - **Issues**: [GitHub Issues](https://github.com/jasonhk24/catholic_retreival/issues)