MedGo / README.md

Update README.md

05ee771 verified 26 days ago

15.2 kB

	---
	license: apache-2.0
	language:
	- zh
	- en
	metrics:
	- accuracy
	base_model:
	- Qwen/Qwen3-32B
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- medical
	model-index:
	- name: Med-Go-32B
	results:
	- task:
	type: text-generation
	dataset:
	type: medical_eval_hle
	name: Medical-Eval-HLE
	metrics:
	- name: accuracy
	type: accuracy
	value: 19.4
	verified: false
	- task:
	type: text-generation
	dataset:
	type: supergpqa
	name: SuperGPQA
	metrics:
	- name: accuracy
	type: accuracy
	value: 37.2
	verified: false
	- task:
	type: text-generation
	dataset:
	type: medbullets
	name: Medbullets
	metrics:
	- name: accuracy
	type: accuracy
	value: 57.8
	verified: false
	- task:
	type: text-generation
	dataset:
	type: mmlu_pro
	name: MMLU-pro
	metrics:
	- name: accuracy
	type: accuracy
	value: 64.3
	verified: false
	- task:
	type: text-generation
	dataset:
	type: afrimedqa
	name: AfrimedQA
	metrics:
	- name: accuracy
	type: accuracy
	value: 74.7
	verified: false
	- task:
	type: text-generation
	dataset:
	type: medmcqa
	name: MedMCQA
	metrics:
	- name: accuracy
	type: accuracy
	value: 68.3
	verified: false
	- task:
	type: text-generation
	dataset:
	type: medqa_usmle
	name: MedQA-USMLE
	metrics:
	- name: accuracy
	type: accuracy
	value: 76.8
	verified: false
	- task:
	type: text-generation
	dataset:
	type: cmb
	name: CMB
	metrics:
	- name: accuracy
	type: accuracy
	value: 92.5
	verified: false
	- task:
	type: text-generation
	dataset:
	type: cmexam
	name: CMExam
	metrics:
	- name: accuracy
	type: accuracy
	value: 87.4
	verified: false
	- task:
	type: text-generation
	dataset:
	type: pubmedqa
	name: PubMedQA
	metrics:
	- name: accuracy
	type: accuracy
	value: 76.6
	verified: false
	- task:
	type: text-generation
	dataset:
	type: medexqa
	name: MedExQA
	metrics:
	- name: accuracy
	type: accuracy
	value: 81.5
	verified: false
	- task:
	type: text-generation
	dataset:
	type: explaincpe
	name: ExplainCPE
	metrics:
	- name: accuracy
	type: accuracy
	value: 89.5
	verified: false
	- task:
	type: text-generation
	dataset:
	type: mmlu_med
	name: MMLU-Med
	metrics:
	- name: accuracy
	type: accuracy
	value: 87.4
	verified: false
	- task:
	type: text-generation
	dataset:
	type: medxperqa
	name: MedXperQA
	metrics:
	- name: accuracy
	type: accuracy
	value: 20.7
	verified: false
	- task:
	type: text-generation
	dataset:
	type: anesbench
	name: AnesBench
	metrics:
	- name: accuracy
	type: accuracy
	value: 53.1
	verified: false
	- task:
	type: text-generation
	dataset:
	type: diagnosisarena
	name: DiagnosisArena
	metrics:
	- name: accuracy
	type: accuracy
	value: 64.4
	verified: false
	- task:
	type: text-generation
	dataset:
	type: clinbench_hbp
	name: Clinbench-HBP
	metrics:
	- name: accuracy
	type: accuracy
	value: 80.6
	verified: false
	- task:
	type: text-generation
	dataset:
	type: medpair
	name: MedPAIR
	metrics:
	- name: accuracy
	type: accuracy
	value: 32.3
	verified: false
	- task:
	type: text-generation
	dataset:
	type: amqa
	name: AMQA
	metrics:
	- name: accuracy
	type: accuracy
	value: 72.7
	verified: false
	- task:
	type: text-generation
	dataset:
	type: medethicaleval
	name: MedethicalEval
	metrics:
	- name: accuracy
	type: accuracy
	value: 92.2
	verified: false
	---

	# MedGo: Medical Large Language Model Based on Qwen3-32B

	<div align="center">

	[![HuggingFace](https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Model-yellow)](https://huggingface.co/OpenMedZoo/MedGo)
	[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
	[![Python](https://img.shields.io/badge/Python-3.8+-blue.svg)](https://www.python.org/)


	English \| [简体中文](./README_CN.md)

	</div>

	## 📋 Table of Contents

	- [Introduction](#introduction)
	- [Key Features](#key-features)
	- [Performance](#performance)
	- [Quick Start](#quick-start)
	- [Training Details](#training-details)
	- [Use Cases](#use-cases)
	- [Limitations & Risks](#limitations--risks)
	- [Citation](#citation)
	- [License](#license)
	- [Contributing](#contributing)
	- [Contact](#contact)

	## 🎯 Introduction

	MedGo is a general-purpose medical large language model fine-tuned from Qwen3-32B, designed for clinical medicine and research scenarios. The model is trained on large-scale multi-source medical corpora and enhanced with complex case data, supporting various capabilities including medical Q&A, clinical summary, clinical reasoning, multi-turn dialogue, and scientific text generation.

	### 🌟 Core Capabilities

	- 📚 Medical Knowledge Q&A: Professional responses based on authoritative medical literature and clinical guidelines
	- 📝 Clinical Documentation: Automated medical record summaries, diagnostic reports, and medical documentation
	- 🔍 Clinical Reasoning: Differential diagnosis, examination recommendations, and treatment suggestions
	- 💬 Multi-turn Dialogue: Patient-doctor interaction simulation and complex case discussions
	- 🔬 Research Support: Literature summarization, research idea generation, and quality control review

	## ✨ Key Features

	\| Feature \| Details \|
	\|---------\|---------\|
	\| Base Architecture \| Qwen3-32B \|
	\| Parameters \| 32B \|
	\| Domain \| Clinical Medicine, Research Support, Healthcare System Integration \|
	\| Fine-tuning Method \| SFT + Preference Alignment (DPO/KTO) \|
	\| Data Sources \| Authoritative medical literature, clinical guidelines, real cases (anonymized) \|
	\| Deployment \| Local deployment, HIS/EMR system integration \|
	\| License \| Apache 2.0 \|

	## 📊 Performance

	MedGo demonstrates excellent performance across multiple medical and general evaluation benchmarks, showing competitive results among 32B-parameter models:

	### Key Benchmark Results

	- AIMedQA: Medical question answering comprehension
	- CME: Clinical reasoning evaluation
	- DiagnosisArena: Diagnostic capability assessment
	- MedQA / MedMCQA: Medical multiple-choice questions
	- PubMedQA: Biomedical literature Q&A
	- MMLU-Pro: Comprehensive capability evaluation

	![Performance Comparison](./main_results.png)

	Performance Highlights:
	- ✅ Average Score: ~70 points (excellent performance in the 32B parameter class)
	- ✅ Strong Tasks: Clinical reasoning (DiagnosisArena, CME) and multi-turn medical Q&A
	- ✅ Balanced Capability: Good performance in medical semantic understanding and multi-task generalization


	## 🚀 Quick Start

	### Requirements

	- Python >= 3.8
	- PyTorch >= 2.0
	- Transformers >= 4.35.0
	- CUDA >= 11.8 (for GPU inference)

	### Installation

	```bash
	# Clone the repository
	git clone https://github.com/OpenMedZoo/MedGo.git
	cd MedGo

	# Install dependencies
	pip install -r requirements.txt
	```

	### Model Download

	Download model weights from HuggingFace:

	```bash
	# Using huggingface-cli
	huggingface-cli download OpenMedZoo/MedGo --local-dir ./models/MedGo

	# Or using git-lfs
	git lfs install
	git clone https://huggingface.co/OpenMedZoo/MedGo
	```

	### Basic Inference

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	# Load model and tokenizer
	model_path = "OpenMedZoo/MedGo"
	tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(
	model_path,
	device_map="auto",
	trust_remote_code=True,
	torch_dtype="auto"
	)

	# Medical Q&A example
	messages = [
	{"role": "system", "content": "You are a professional medical assistant. Please answer questions based on medical knowledge."},
	{"role": "user", "content": "What is hypertension and what are the common treatment methods?"}
	]

	# Generate response
	inputs = tokenizer.apply_chat_template(
	messages,
	tokenize=True,
	add_generation_prompt=True,
	return_tensors="pt"
	).to(model.device)

	outputs = model.generate(
	inputs,
	max_new_tokens=512,
	temperature=0.7,
	top_p=0.9,
	do_sample=True
	)

	response = tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True)
	print(response)
	```

	### Batch Inference

	```bash
	# Use the provided inference script
	python scripts/inference.py \
	--model_path OpenMedZoo/MedGo \
	--input_file examples/medical_qa.jsonl \
	--output_file results/predictions.jsonl \
	--batch_size 4
	```

	### Accelerated Inference with vLLM

	```python
	from vllm import LLM, SamplingParams

	# Initialize vLLM
	llm = LLM(model="OpenMedZoo/MedGo", trust_remote_code=True)
	sampling_params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=512)

	# Batch inference
	prompts = [
	"What are the symptoms and treatment methods for diabetes?",
	"What dietary precautions should hypertensive patients take?"
	]

	outputs = llm.generate(prompts, sampling_params)
	for output in outputs:
	print(output.outputs[0].text)
	```

	## 🔧 Training Details

	MedGo employs a two-stage fine-tuning strategy to balance general medical knowledge with clinical task adaptation.

	### Stage I: General Medical Alignment

	Objective: Establish a solid foundation of medical knowledge and improve Q&A standardization

	- Data Sources:
	- Authoritative medical literature (PubMed, medical textbooks)
	- Clinical guidelines and diagnostic standards
	- Medical encyclopedia entries and terminology databases

	- Training Methods:
	- Supervised Fine-Tuning (SFT)
	- Chain-of-Thought (CoT) guided samples
	- Medical terminology alignment and safety constraints

	### Stage II: Clinical Task Enhancement

	Objective: Enhance complex case reasoning and multi-task processing capabilities

	- Data Sources:
	- Real medical records (fully anonymized)
	- Outpatient and emergency records with complex multi-diagnosis samples
	- Research articles and quality control cases

	- Data Augmentation Techniques:
	- Semantic paraphrasing and multi-perspective expansion
	- Complex case synthesis
	- Doctor-patient interaction simulation

	- Training Methods:
	- Multi-Task Learning (medical record summary, differential diagnosis, examination suggestions, etc.)
	- Preference Alignment (DPO/KTO)
	- Expert feedback iterative optimization

	### Training Optimization Focus

	- ✅ Strengthen information extraction and cross-evidence reasoning for complex cases
	- ✅ Improve medical consistency and interpretability of outputs
	- ✅ Optimize expression compliance and safety
	- ✅ Continuous iteration through expert samples and automated evaluation

	## 💡 Use Cases

	### ✅ Suitable Scenarios

	\| Scenario \| Description \|
	\|----------\|-------------\|
	\| Clinical Assistance \| Preliminary diagnosis suggestions, medical record writing, formatted report generation \|
	\| Research Support \| Literature summarization, research idea generation, data analysis assistance \|
	\| Quality Control \| Medical document compliance checking, clinical process quality control \|
	\| System Integration \| Embedded in HIS/EMR systems to provide intelligent decision support \|
	\| Medical Education \| Case discussions, medical knowledge Q&A, clinical reasoning training \|

	### 🚫 Unsuitable Scenarios

	- ❌ Cannot Replace Doctors: Only an auxiliary tool, not a standalone diagnostic basis
	- ❌ High-Risk Operations: Not recommended for surgical decisions or other high-risk medical operations
	- ❌ Rare Disease Limitations: May perform poorly on rare diseases outside training data
	- ❌ Emergency Care: Not suitable for scenarios requiring immediate decisions

	## ⚠️ Limitations & Risks

	### Model Limitations

	1. Understanding Bias: Despite covering extensive medical knowledge, may still produce understanding biases or incorrect recommendations
	2. Complex Cases: Higher risk for cases with complex conditions, severe complications, or missing information
	3. Knowledge Currency: Medical knowledge continuously updates; training data may lag
	4. Language Limitation: Primarily designed for Chinese medical scenarios; performance in other languages may vary

	### Usage Recommendations

	- ⚠️ Use in controlled environments with clinical expert review of generated results
	- ⚠️ Treat model outputs as auxiliary references, not final diagnostic conclusions
	- ⚠️ For sensitive cases or high-risk scenarios, expert consultation is mandatory
	- ⚠️ Deployment requires internal validation, security review, and clinical testing

	### Data Privacy & Compliance

	- 🔒 Training data fully anonymized
	- 🔒 Attention to patient privacy protection during use
	- 🔒 Production deployment must comply with healthcare data security regulations (e.g., HIPAA, GDPR)
	- 🔒 Local deployment recommended to avoid sensitive data transmission

	## 📚 Citation

	If MedGo is helpful for your research or project, please cite our work:

	```bibtex
	@misc{openmedzoo_2025,
	author = { OpenMedZoo },
	title = { MedGo (Revision 640a2e2) },
	year = 2025,
	url = { https://huggingface.co/OpenMedZoo/MedGo },
	doi = { 10.57967/hf/7024 },
	publisher = { Hugging Face }
	}
	```

	## 📄 License

	This project is licensed under the [Apache License 2.0](LICENSE).

	Commercial Use Notice:
	- ✅ Commercial use and modification allowed
	- ✅ Original license and copyright notice must be retained
	- ✅ Contact us for technical support when integrating into healthcare systems

	## 🤝 Contributing

	We welcome community contributions! Here's how to participate:

	### Contribution Types

	- 🐛 Submit bug reports
	- 💡 Propose new features
	- 📝 Improve documentation
	- 🔧 Submit code fixes or optimizations
	- 📊 Share evaluation results and use cases


	## 🙏 Acknowledgments

	Thanks to all contributors to the MedGo project:

	- Model development and fine-tuning algorithm team
	- Data annotation and quality control team
	- Clinical expert guidance and review team
	- Open-source community support and feedback

	Special thanks to:
	- [Qwen Team](https://github.com/QwenLM/Qwen) for providing excellent foundation models
	- All healthcare institutions that provided data and feedback

	## 📧 Contact

	- HuggingFace: [Model Homepage](https://huggingface.co/OpenMedZoo/MedGo)

	## Copyright
	- Publisher: Tongji University Affiliated East Hospital — Sole Corresponding Author
	- Co-developer / Technical Support: Shanghai Shuole Technology Co., Ltd.
	- Contact: dongfyy@pudong.gov.cn
	- Version: v1.0
	- Attribution (required):
	“Powered by Med-Go 32B, released by Tongji University Affiliated East Hospital (v1.0).”

	---

	<div align="center">
	</div>