hari-q2.5-thinking / README.md
easypyeong's picture
Update README.md
a2b3db1 verified
---
license: apache-2.0
language:
- ko
- en
base_model:
- Qwen/Qwen2.5-72B
tags:
- medical
- clinical
- QA
- benchmark
- healthcare
- korean
---
๐Ÿง  **Korean Medical LLM (QA-Finetuned) by Healthcare AI Research Institute of Seoul National University Hospital**
Welcome to the official repository of the **Korean Medical Large Language Model (LLM)** developed by the **Healthcare AI Research Institute (HARI)** at **Seoul National University Hospital (SNUH)**.
This model is **fine-tuned on Korean medical questionโ€“answering (QA) style data**, enabling robust performance in clinical reasoning, educational Q&A, and domain-specific medical inference.
---
## ๐Ÿš€ Model Overview
* **Model Name**: `snuh/hari-q2.5-thinking`
* **Architecture**: Large Language Model (LLM)
* **Fine-tuning Objective**: Medical QA (Questionโ€“Answer) style generation
* **Primary Language**: English, Korean
* **Domain**: Clinical Medicine
* **Performance**: Achieves **89.2% accuracy** on the **Korean Medical Licensing Examination (KMLE)**
* **Key Applications**:
* Clinical decision support (QA-style)
* Medical education and self-assessment tools
* Automated medical reasoning and documentation aid
---
## ๐Ÿ“Š Training Data & Benchmark
This model was fine-tuned using a curated corpus of Korean medical QA-style data derived from **publicly available, de-identified sources**. The training data includes clinical guidelines, academic publications, exam-style questions, and synthetic prompts reflecting real-world clinical reasoning.
* **Training Data Characteristics**:
- Focused on Korean-language questionโ€“answering formats relevant to clinical settings.
- Includes guideline-derived questions, de-identified case descriptions, and physician-crafted synthetic queries.
- Designed to reflect realistic diagnostic, therapeutic, and decision-making scenarios.
* **Benchmark Evaluation**:
- **KMLE QA benchmark(KorMedMCQA 5-shot)**
- Doctor: 89.20%
- Nurse: 90.99%
- Pharm: 90.94%
- Dentist: 72.96%
- **USMLE QA benchmark(MedQA-USMLE 0-shot)**
- 88.36%
- All evaluations were conducted on de-identified, non-clinical test sets, with no real patient data involved.
> โš ๏ธ These benchmarks are provided for research purposes only and do not imply clinical safety or efficacy.
---
## ๐Ÿ” Privacy & Ethical Compliance
We strictly adhere to ethical AI development and privacy protection:
* โœ… The model was trained exclusively on **publicly available and de-identified data**.
* ๐Ÿ”’ It does **not include any real patient data or personally identifiable information (PII)**.
* โš–๏ธ Designed for **safe, responsible, and research-oriented** use in healthcare AI.
> โš ๏ธ This model is intended for **research and educational purposes only** and should **not** be used to make clinical decisions.
---
## ๐Ÿฅ About HARI โ€“ Healthcare AI Research Institute
The **Healthcare AI Research Institute (HARI)** is a pioneering research group within **Seoul National University Hospital**, driving innovation in medical AI.
### ๐ŸŒ Vision & Mission
* **Vision**: Shaping a sustainable and healthy future through pioneering AI research.
* **Mission**:
* Develop clinically useful, trustworthy AI technologies.
* Foster cross-disciplinary collaboration in medicine and AI.
* Lead global healthcare AI commercialization and policy frameworks.
* Educate the next generation of AI-powered medical professionals.
---
## ๐Ÿงช Research Platforms & Infrastructure
* **Platforms**: SUPREME, SNUHUB, DeView, VitalDB, KHDP
* **Computing**: NVIDIA B200 / H100 / A100 GPUs
* **Projects**:
* Clinical note summarization
* AI-powered diagnostics
* EHR automation
* Real-time monitoring via AI pipelines
---
## ๐ŸŽ“ AI Education Programs
* **Basic AI for Healthcare**: Designed for clinicians and students
* **Advanced AI Research**: Targeting senior researchers and specialists in clinical AI validation and deep learning
---
## ๐Ÿค Collaborate with Us
We welcome collaboration with:
* AI research institutions and medical universities
* Healthcare startups and technology partners
* Policymakers shaping AI regulation in medicine
๐Ÿ“ง **Contact**: [hhoon@snu.ac.kr](mailto:hhoon@snu.ac.kr)
๐ŸŒ **Website**: [Seoul National University Hospital](https://www.snuh.org)
---
## ๐Ÿค— Model Usage Example
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load tokenizer and model
model_name = "snuh/hari-q2.5-thinking"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = '''
### Instruction:
๋‹น์‹ ์€ ์ž„์ƒ ์ง€์‹์„ ๊ฐ–์ถ˜ ์œ ๋Šฅํ•˜๊ณ  ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š” ํ•œ๊ตญ์–ด ๊ธฐ๋ฐ˜ ์˜๋ฃŒ ์–ด์‹œ์Šคํ„ดํŠธ์ž…๋‹ˆ๋‹ค.
์‚ฌ์šฉ์ž์˜ ์งˆ๋ฌธ์— ๋Œ€ํ•ด ์ •ํ™•ํ•˜๊ณ  ์‹ ์ค‘ํ•œ ์ž„์ƒ ์ถ”๋ก ์„ ๋ฐ”ํƒ•์œผ๋กœ ์ง„๋‹จ ๊ฐ€๋Šฅ์„ฑ์„ ์ œ์‹œํ•ด ์ฃผ์„ธ์š”.
๋ฐ˜๋“œ์‹œ ํ™˜์ž์˜ ์—ฐ๋ น, ์ฆ์ƒ, ๊ฒ€์‚ฌ ๊ฒฐ๊ณผ, ํ†ต์ฆ ๋ถ€์œ„ ๋“ฑ ๋ชจ๋“  ๋‹จ์„œ๋ฅผ ์ข…ํ•ฉ์ ์œผ๋กœ ๊ณ ๋ คํ•˜์—ฌ ์ถ”๋ก  ๊ณผ์ •๊ณผ ์ง„๋‹จ๋ช…์„ ์ œ์‹œํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
์˜ํ•™์ ์œผ๋กœ ์ •ํ™•ํ•œ ์šฉ์–ด๋ฅผ ์‚ฌ์šฉํ•˜๋˜, ํ•„์š”ํ•˜๋‹ค๋ฉด ์ผ๋ฐ˜์ธ์ด ์ดํ•ดํ•˜๊ธฐ ์‰ฌ์šด ์šฉ์–ด๋„ ๋ณ‘ํ–‰ํ•ด ์„ค๋ช…ํ•ด ์ฃผ์„ธ์š”.
### Question:
60์„ธ ๋‚จ์„ฑ์ด ๋ณตํ†ต๊ณผ ๋ฐœ์—ด์„ ํ˜ธ์†Œํ•˜๋ฉฐ ๋‚ด์›ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
ํ˜ˆ์•ก ๊ฒ€์‚ฌ ๊ฒฐ๊ณผ ๋ฐฑํ˜ˆ๊ตฌ ์ˆ˜์น˜๊ฐ€ ์ƒ์Šนํ–ˆ๊ณ , ์šฐ์ธก ํ•˜๋ณต๋ถ€ ์••ํ†ต์ด ํ™•์ธ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
๊ฐ€์žฅ ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์€ ์ง„๋‹จ๋ช…์€ ๋ฌด์—‡์ธ๊ฐ€์š”?
'''.strip()
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
````
---
## ๐Ÿ“„ License
**Apache 2.0 License** โ€“ Free for research and commercial use with attribution.
---
## ๐Ÿ“ข Citation
If you use this model in your work, please cite:
```
@misc{hari-q2.5-thinking,
title = {hari-q2.5-thinking},
url = {https://huggingface.co/snuh/hari-q2.5-thinking},
author = {Healthcare AI Research Institute(HARI) of Seoul National University Hospital(SNUH)},
month = {December},
year = {2025}
}
```
---
## ๐Ÿš€ Together, we are shaping the future of AI-driven healthcare.
---
## Acknowlegments
This work was supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (RS-2025-02653113, High-Performance Research AI Computing Infrastructure Support at the 2 PFLOPS Scale)