Mark-CHAE
/

ViTCM-LLM

Visual Question Answering

traditional-chinese-medicine

tongue-diagnosis

Model card Files Files and versions

ViTCM-LLM / README.md

Mark-CHAE's picture

Update README.md

abe6183 verified 7 months ago

|

history blame contribute delete

3.66 kB

	---
	language:
	- en
	- ko
	- zh
	license: apache-2.0
	library_name: peft
	pipeline_tag: visual-question-answering
	tags:
	- vision
	- visual-question-answering
	- multimodal
	- qwen
	- lora
	- tcm
	- traditional-chinese-medicine
	- tongue-diagnosis
	---

	# ViTCM_LLM - Traditional Chinese Medicine Tongue Diagnosis Model

	This is a LoRA (Low-Rank Adaptation) adapter for the Qwen2.5-VL-32B-Instruct model, fine-tuned specifically for Traditional Chinese Medicine (TCM) tongue diagnosis tasks.

	## Model Details

	### Model Description

	- Developed by: Mark-CHAE
	- Model type: LoRA Adapter for Qwen2.5-VL-32B-Instruct
	- Language(s) (NLP): Chinese
	- License: Apache-2.0
	- Finetuned from model: Qwen/Qwen2.5-VL-32B-Instruct
	- Specialization: Traditional Chinese Medicine Tongue Diagnosis

	### Model Sources

	- Repository: [Mark-CHAE/
	ViTCM-LLM ](https://huggingface.co/Mark-CHAE/ViTCM-LLM)
	- Base Model: [Qwen/Qwen2.5-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-32B-Instruct)

	## Uses

	### Direct Use

	This LoRA adapter can be used with the base Qwen2.5-VL-32B-Instruct model for multimodal vision-language tasks including:

	- Traditional Chinese Medicine tongue diagnosis
	- Tongue image analysis and interpretation
	- Visual question answering for medical images
	- Multimodal medical conversations
	- Symptom analysis from tongue images

	### Downstream Use

	The adapter can be loaded with the base model for inference or further fine-tuning on specific TCM diagnosis tasks.

	## How to Get Started with the Model

	### Using the Inference Widget

	You can try the model directly in the browser using the Visual Question Answering widget above. Simply upload a tongue image and ask a question about it.

	### Using the Model in Code

	```python
	from peft import PeftModel
	from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor
	import torch
	from PIL import Image

	# Load base model and tokenizer
	base_model = AutoModelForCausalLM.from_pretrained(
	"Qwen/Qwen2.5-VL-32B-Instruct",
	torch_dtype=torch.float16,
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct")
	processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct")

	# Load LoRA adapter
	model = PeftModel.from_pretrained(base_model, "Mark-CHAE/ViTCM-LLM")

	# Prepare inputs
	image = Image.open("tongue_image.jpg")
	question = "根据图片判断舌诊内容"

	prompt = f"<\|im_start\|>user\n<image>\n{question}<\|im_end\|>\n<\|im_start\|>assistant\n"

	inputs = processor(
	text=prompt,
	images=image,
	return_tensors="pt"
	)

	# Generate response
	with torch.no_grad():
	outputs = model.generate(
	**inputs,
	max_length=512,
	temperature=0.7,
	top_p=0.9,
	do_sample=True,
	pad_token_id=tokenizer.eos_token_id
	)

	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	answer = response.split("<\|im_start\|>assistant")[-1].strip()
	print(answer)
	```


	### Training Procedure

	#### Training Hyperparameters

	- Training regime: LoRA fine-tuning
	- LoRA rank: 64
	- LoRA alpha: 128
	- Target modules: v_proj, qkv, attn.proj, q_proj, gate_proj, down_proj, up_proj, o_proj, k_proj


	#### Speeds, Sizes, Times

	- Adapter size: 2.2GB
	- Base model: Qwen2.5-VL-32B-Instruct (32B parameters)


	#### Software

	- PEFT 0.15.2
	- Transformers library
	- PyTorch



	APA:

	Mark-CHAE. (2024). ViTCM_LLM: Traditional Chinese Medicine Tongue Diagnosis Model. Hugging Face. https://huggingface.co/Mark-CHAE/shezhen

	## Model Card Contact

	For questions about this model, please contact the model author.

	### Framework versions

	- PEFT 0.15.2