ViTCM-LLM / model_card.md

Upload folder using huggingface_hub

8374b0f verified 7 months ago

5.39 kB

	---
	language:
	- en
	- ko
	license: apache-2.0
	library_name: peft
	pipeline_tag: visual-question-answering
	tags:
	- vision
	- visual-question-answering
	- multimodal
	- qwen
	- lora
	- tcm
	- traditional-chinese-medicine
	---

	# ViTCM_LLM - Traditional Chinese Medicine Diagnosis Model

	This is a LoRA (Low-Rank Adaptation) adapter for the Qwen2.5-VL-32B-Instruct model, fine-tuned specifically for Traditional Chinese Medicine (TCM) diagnosis tasks.

	## Model Details

	### Model Description

	- Developed by: Mark-CHAE
	- Model type: LoRA Adapter for Qwen2.5-VL-32B-Instruct
	- Language(s) (NLP): English, Korean
	- License: Apache-2.0
	- Finetuned from model: Qwen/Qwen2.5-VL-32B-Instruct
	- Specialization: Traditional Chinese Medicine Diagnosis

	### Model Sources

	- Repository: [Mark-CHAE/shezhen](https://huggingface.co/Mark-CHAE/shezhen)
	- Base Model: [Qwen/Qwen2.5-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-32B-Instruct)

	## Uses

	### Direct Use

	This LoRA adapter can be used with the base Qwen2.5-VL-32B-Instruct model for multimodal vision-language tasks including:

	- Image understanding and description
	- Visual question answering
	- Image-text generation
	- Multimodal conversations
	- Traditional Chinese Medicine diagnosis
	- Symptom analysis from medical images

	### Downstream Use

	The adapter can be loaded with the base model for inference or further fine-tuning on specific TCM diagnosis tasks.

	### Out-of-Scope Use

	This model should not be used for:

	- Generating harmful, offensive, or inappropriate content
	- Creating deepfakes or misleading visual content
	- Any illegal activities
	- Making actual medical diagnoses without proper medical supervision

	### Recommendations

	Users should:

	- Verify outputs for accuracy and appropriateness
	- Be aware of potential biases in the model
	- Use appropriate safety measures when deploying
	- Not rely solely on this model for medical diagnosis
	- Consult qualified medical professionals for actual diagnosis

	## How to Get Started with the Model

	### Using the Inference Widget

	You can try the model directly in the browser using the Visual Question Answering widget above. Simply upload an image and ask a question about it.

	### Using the Model in Code

	```python
	from peft import PeftModel
	from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor
	import torch
	from PIL import Image

	# Load base model and tokenizer
	base_model = AutoModelForCausalLM.from_pretrained(
	"Qwen/Qwen2.5-VL-32B-Instruct",
	torch_dtype=torch.float16,
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct")
	processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct")

	# Load LoRA adapter
	model = PeftModel.from_pretrained(base_model, "Mark-CHAE/shezhen")

	# Prepare inputs
	image = Image.open("your_image.jpg")
	question = "根据图片判断舌诊内容"

	prompt = f"<\|im_start\|>user\n<image>\n{question}<\|im_end\|>\n<\|im_start\|>assistant\n"

	inputs = processor(
	text=prompt,
	images=image,
	return_tensors="pt"
	)

	# Generate response
	with torch.no_grad():
	outputs = model.generate(
	**inputs,
	max_length=512,
	temperature=0.7,
	top_p=0.9,
	do_sample=True,
	pad_token_id=tokenizer.eos_token_id
	)

	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	answer = response.split("<\|im_start\|>assistant")[-1].strip()
	print(answer)
	```

	## Training Details

	### Training Data

	The model was fine-tuned on multimodal vision-language data including English and Korean content, with specific focus on Traditional Chinese Medicine diagnosis scenarios.

	### Training Procedure

	#### Training Hyperparameters

	- Training regime: LoRA fine-tuning
	- LoRA rank: 64
	- LoRA alpha: 128
	- Target modules: v_proj, qkv, attn.proj, q_proj, gate_proj, down_proj, up_proj, o_proj, k_proj

	#### Speeds, Sizes, Times

	- Adapter size: 2.2GB
	- Base model: Qwen2.5-VL-32B-Instruct (32B parameters)

	## Evaluation

	### Testing Data, Factors & Metrics

	#### Testing Data

	Evaluation was performed on multimodal vision-language benchmarks with focus on medical image understanding.

	#### Metrics

	Standard vision-language evaluation metrics including accuracy, BLEU, and human evaluation scores.

	### Results

	[Evaluation results to be added]

	#### Summary

	This LoRA adapter provides an efficient way to adapt the Qwen2.5-VL-32B-Instruct model for Traditional Chinese Medicine diagnosis tasks while maintaining the base model's capabilities.


	## Technical Specifications

	### Model Architecture and Objective

	- Architecture: LoRA adapter for Qwen2.5-VL-32B-Instruct
	- Objective: Multimodal vision-language understanding and generation, specialized for TCM Tongue diagnosis

	### Compute Infrastructure

	#### Hardware

	[To be specified]

	#### Software

	- PEFT 0.15.2
	- Transformers library
	- PyTorch


	APA:

	Mark-CHAE. (2024). ViTCM_LLM: Traditional Chinese Medicine Diagnosis Model. Hugging Face. https://huggingface.co/Mark-CHAE/shezhen

	## Model Card Contact

	For questions about this model, please contact the model author.

	### Framework versions

	- PEFT 0.15.2