| | ---
|
| | language:
|
| | - en
|
| | - ko
|
| | license: apache-2.0
|
| | library_name: peft
|
| | pipeline_tag: visual-question-answering
|
| | tags:
|
| | - vision
|
| | - visual-question-answering
|
| | - multimodal
|
| | - qwen
|
| | - lora
|
| | - tcm
|
| | - traditional-chinese-medicine
|
| | ---
|
| |
|
| | # ViTCM_LLM - Traditional Chinese Medicine Diagnosis Model
|
| |
|
| | This is a LoRA (Low-Rank Adaptation) adapter for the Qwen2.5-VL-32B-Instruct model, fine-tuned specifically for Traditional Chinese Medicine (TCM) diagnosis tasks.
|
| |
|
| | ## Model Details
|
| |
|
| | ### Model Description
|
| |
|
| | - **Developed by:** Mark-CHAE
|
| | - **Model type:** LoRA Adapter for Qwen2.5-VL-32B-Instruct
|
| | - **Language(s) (NLP):** English, Korean
|
| | - **License:** Apache-2.0
|
| | - **Finetuned from model:** Qwen/Qwen2.5-VL-32B-Instruct
|
| | - **Specialization:** Traditional Chinese Medicine Diagnosis
|
| |
|
| | ### Model Sources
|
| |
|
| | - **Repository:** [Mark-CHAE/shezhen](https://huggingface.co/Mark-CHAE/shezhen)
|
| | - **Base Model:** [Qwen/Qwen2.5-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-32B-Instruct)
|
| |
|
| | ## Uses
|
| |
|
| | ### Direct Use
|
| |
|
| | This LoRA adapter can be used with the base Qwen2.5-VL-32B-Instruct model for multimodal vision-language tasks including:
|
| |
|
| | - Image understanding and description
|
| | - Visual question answering
|
| | - Image-text generation
|
| | - Multimodal conversations
|
| | - Traditional Chinese Medicine diagnosis
|
| | - Symptom analysis from medical images
|
| |
|
| | ### Downstream Use
|
| |
|
| | The adapter can be loaded with the base model for inference or further fine-tuning on specific TCM diagnosis tasks.
|
| |
|
| | ### Out-of-Scope Use
|
| |
|
| | This model should not be used for:
|
| |
|
| | - Generating harmful, offensive, or inappropriate content
|
| | - Creating deepfakes or misleading visual content
|
| | - Any illegal activities
|
| | - Making actual medical diagnoses without proper medical supervision
|
| |
|
| | ### Recommendations
|
| |
|
| | Users should:
|
| |
|
| | - Verify outputs for accuracy and appropriateness
|
| | - Be aware of potential biases in the model
|
| | - Use appropriate safety measures when deploying
|
| | - Not rely solely on this model for medical diagnosis
|
| | - Consult qualified medical professionals for actual diagnosis
|
| |
|
| | ## How to Get Started with the Model
|
| |
|
| | ### Using the Inference Widget
|
| |
|
| | You can try the model directly in the browser using the Visual Question Answering widget above. Simply upload an image and ask a question about it.
|
| |
|
| | ### Using the Model in Code
|
| |
|
| | ```python
|
| | from peft import PeftModel
|
| | from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor
|
| | import torch
|
| | from PIL import Image
|
| |
|
| | # Load base model and tokenizer
|
| | base_model = AutoModelForCausalLM.from_pretrained(
|
| | "Qwen/Qwen2.5-VL-32B-Instruct",
|
| | torch_dtype=torch.float16,
|
| | device_map="auto"
|
| | )
|
| | tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct")
|
| | processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct")
|
| |
|
| | # Load LoRA adapter
|
| | model = PeftModel.from_pretrained(base_model, "Mark-CHAE/shezhen")
|
| |
|
| | # Prepare inputs
|
| | image = Image.open("your_image.jpg")
|
| | question = "根据图片判断舌诊内容"
|
| |
|
| | prompt = f"<|im_start|>user\n<image>\n{question}<|im_end|>\n<|im_start|>assistant\n"
|
| |
|
| | inputs = processor(
|
| | text=prompt,
|
| | images=image,
|
| | return_tensors="pt"
|
| | )
|
| |
|
| | # Generate response
|
| | with torch.no_grad():
|
| | outputs = model.generate(
|
| | **inputs,
|
| | max_length=512,
|
| | temperature=0.7,
|
| | top_p=0.9,
|
| | do_sample=True,
|
| | pad_token_id=tokenizer.eos_token_id
|
| | )
|
| |
|
| | response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
| | answer = response.split("<|im_start|>assistant")[-1].strip()
|
| | print(answer)
|
| | ```
|
| |
|
| | ## Training Details
|
| |
|
| | ### Training Data
|
| |
|
| | The model was fine-tuned on multimodal vision-language data including English and Korean content, with specific focus on Traditional Chinese Medicine diagnosis scenarios.
|
| |
|
| | ### Training Procedure
|
| |
|
| | #### Training Hyperparameters
|
| |
|
| | - **Training regime:** LoRA fine-tuning
|
| | - **LoRA rank:** 64
|
| | - **LoRA alpha:** 128
|
| | - **Target modules:** v_proj, qkv, attn.proj, q_proj, gate_proj, down_proj, up_proj, o_proj, k_proj
|
| |
|
| | #### Speeds, Sizes, Times
|
| |
|
| | - **Adapter size:** 2.2GB
|
| | - **Base model:** Qwen2.5-VL-32B-Instruct (32B parameters)
|
| |
|
| | ## Evaluation
|
| |
|
| | ### Testing Data, Factors & Metrics
|
| |
|
| | #### Testing Data
|
| |
|
| | Evaluation was performed on multimodal vision-language benchmarks with focus on medical image understanding.
|
| |
|
| | #### Metrics
|
| |
|
| | Standard vision-language evaluation metrics including accuracy, BLEU, and human evaluation scores.
|
| |
|
| | ### Results
|
| |
|
| | [Evaluation results to be added]
|
| |
|
| | #### Summary
|
| |
|
| | This LoRA adapter provides an efficient way to adapt the Qwen2.5-VL-32B-Instruct model for Traditional Chinese Medicine diagnosis tasks while maintaining the base model's capabilities.
|
| |
|
| |
|
| | ## Technical Specifications
|
| |
|
| | ### Model Architecture and Objective
|
| |
|
| | - **Architecture:** LoRA adapter for Qwen2.5-VL-32B-Instruct
|
| | - **Objective:** Multimodal vision-language understanding and generation, specialized for TCM Tongue diagnosis
|
| |
|
| | ### Compute Infrastructure
|
| |
|
| | #### Hardware
|
| |
|
| | [To be specified]
|
| |
|
| | #### Software
|
| |
|
| | - PEFT 0.15.2
|
| | - Transformers library
|
| | - PyTorch
|
| |
|
| |
|
| | **APA:**
|
| |
|
| | Mark-CHAE. (2024). ViTCM_LLM: Traditional Chinese Medicine Diagnosis Model. Hugging Face. https://huggingface.co/Mark-CHAE/shezhen
|
| |
|
| | ## Model Card Contact
|
| |
|
| | For questions about this model, please contact the model author.
|
| |
|
| | ### Framework versions
|
| |
|
| | - PEFT 0.15.2 |