| | --- |
| | language: |
| | - en |
| | - ko |
| | - zh |
| | license: apache-2.0 |
| | library_name: peft |
| | pipeline_tag: visual-question-answering |
| | tags: |
| | - vision |
| | - visual-question-answering |
| | - multimodal |
| | - qwen |
| | - lora |
| | - tcm |
| | - traditional-chinese-medicine |
| | - tongue-diagnosis |
| | --- |
| | |
| | # ViTCM_LLM - Traditional Chinese Medicine Tongue Diagnosis Model |
| | |
| | This is a LoRA (Low-Rank Adaptation) adapter for the Qwen2.5-VL-32B-Instruct model, fine-tuned specifically for Traditional Chinese Medicine (TCM) tongue diagnosis tasks. |
| | |
| | ## Model Details |
| | |
| | ### Model Description |
| | |
| | - **Developed by:** Mark-CHAE |
| | - **Model type:** LoRA Adapter for Qwen2.5-VL-32B-Instruct |
| | - **Language(s) (NLP):** Chinese |
| | - **License:** Apache-2.0 |
| | - **Finetuned from model:** Qwen/Qwen2.5-VL-32B-Instruct |
| | - **Specialization:** Traditional Chinese Medicine Tongue Diagnosis |
| | |
| | ### Model Sources |
| | |
| | - **Repository:** [Mark-CHAE/ |
| | ViTCM-LLM ](https://huggingface.co/Mark-CHAE/ViTCM-LLM) |
| | - **Base Model:** [Qwen/Qwen2.5-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-32B-Instruct) |
| | |
| | ## Uses |
| | |
| | ### Direct Use |
| | |
| | This LoRA adapter can be used with the base Qwen2.5-VL-32B-Instruct model for multimodal vision-language tasks including: |
| | |
| | - Traditional Chinese Medicine tongue diagnosis |
| | - Tongue image analysis and interpretation |
| | - Visual question answering for medical images |
| | - Multimodal medical conversations |
| | - Symptom analysis from tongue images |
| | |
| | ### Downstream Use |
| | |
| | The adapter can be loaded with the base model for inference or further fine-tuning on specific TCM diagnosis tasks. |
| | |
| | ## How to Get Started with the Model |
| | |
| | ### Using the Inference Widget |
| | |
| | You can try the model directly in the browser using the Visual Question Answering widget above. Simply upload a tongue image and ask a question about it. |
| | |
| | ### Using the Model in Code |
| | |
| | ```python |
| | from peft import PeftModel |
| | from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor |
| | import torch |
| | from PIL import Image |
| | |
| | # Load base model and tokenizer |
| | base_model = AutoModelForCausalLM.from_pretrained( |
| | "Qwen/Qwen2.5-VL-32B-Instruct", |
| | torch_dtype=torch.float16, |
| | device_map="auto" |
| | ) |
| | tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct") |
| | processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct") |
| | |
| | # Load LoRA adapter |
| | model = PeftModel.from_pretrained(base_model, "Mark-CHAE/ViTCM-LLM") |
| |
|
| | # Prepare inputs |
| | image = Image.open("tongue_image.jpg") |
| | question = "根据图片判断舌诊内容" |
| | |
| | prompt = f"<|im_start|>user\n<image>\n{question}<|im_end|>\n<|im_start|>assistant\n" |
| |
|
| | inputs = processor( |
| | text=prompt, |
| | images=image, |
| | return_tensors="pt" |
| | ) |
| | |
| | # Generate response |
| | with torch.no_grad(): |
| | outputs = model.generate( |
| | **inputs, |
| | max_length=512, |
| | temperature=0.7, |
| | top_p=0.9, |
| | do_sample=True, |
| | pad_token_id=tokenizer.eos_token_id |
| | ) |
| | |
| | response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
| | answer = response.split("<|im_start|>assistant")[-1].strip() |
| | print(answer) |
| | ``` |
| | |
| | |
| | ### Training Procedure |
| | |
| | #### Training Hyperparameters |
| | |
| | - **Training regime:** LoRA fine-tuning |
| | - **LoRA rank:** 64 |
| | - **LoRA alpha:** 128 |
| | - **Target modules:** v_proj, qkv, attn.proj, q_proj, gate_proj, down_proj, up_proj, o_proj, k_proj |
| |
|
| |
|
| | #### Speeds, Sizes, Times |
| |
|
| | - **Adapter size:** 2.2GB |
| | - **Base model:** Qwen2.5-VL-32B-Instruct (32B parameters) |
| |
|
| |
|
| | #### Software |
| |
|
| | - PEFT 0.15.2 |
| | - Transformers library |
| | - PyTorch |
| |
|
| |
|
| |
|
| | **APA:** |
| |
|
| | Mark-CHAE. (2024). ViTCM_LLM: Traditional Chinese Medicine Tongue Diagnosis Model. Hugging Face. https://huggingface.co/Mark-CHAE/shezhen |
| | |
| | ## Model Card Contact |
| | |
| | For questions about this model, please contact the model author. |
| | |
| | ### Framework versions |
| | |
| | - PEFT 0.15.2 |