File size: 3,370 Bytes
ab5ba52 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 | ---
license: apache-2.0
base_model: Qwen/Qwen3-VL-8B-Thinking
tags:
- dermatology
- medical
- lora
- peft
- skin-disease
- qwen3-vl
language:
- en
- th
pipeline_tag: image-text-to-text
---
<p align="center">
<img src="HIKARI_logo.png" alt="HIKARI" width="100%"/>
</p>
<h1 align="center">HIKARI-Rigel-8B-SkinCaption-LoRA</h1>
<p align="center">
<img src="https://img.shields.io/badge/Type-LoRA%20Adapter-blueviolet?style=flat-square"/>
<img src="https://img.shields.io/badge/Size-~1.1%20GB-lightblue?style=flat-square"/>
<img src="https://img.shields.io/badge/Base-Qwen3--VL--8B--Thinking-blue?style=flat-square"/>
<img src="https://img.shields.io/badge/License-Apache%202.0-orange?style=flat-square"/>
</p>
---
## ๐ Model Type: LoRA Adapter
> This is a **LoRA adapter** (~1.1 GB) โ it must be loaded **on top of** the base model `Qwen/Qwen3-VL-8B-Thinking`.
>
> โ
**Advantage:** Lightweight โ download only ~1.1 GB instead of ~17 GB.
>
> โ ๏ธ **Requirement:** You must separately load `Qwen/Qwen3-VL-8B-Thinking` (base model, ~17 GB) first.
>
> ๐พ If you prefer a standalone ready-to-use model, see the merged version:
> **[E27085921/HIKARI-Rigel-8B-SkinCaption](https://huggingface.co/E27085921/HIKARI-Rigel-8B-SkinCaption)** (~17 GB)
---
## What is this adapter?
LoRA adapter for **[HIKARI-Rigel-8B-SkinCaption](https://huggingface.co/E27085921/HIKARI-Rigel-8B-SkinCaption)** โ Clinical skin lesion caption generation (checkpoint-init, ablation baseline). Metric: **BLEU-4: 9.82**.
This is the ablation baseline adapter. For the best caption model, see [HIKARI-Vega-8B-SkinCaption-Fused-LoRA](https://huggingface.co/E27085921/HIKARI-Vega-8B-SkinCaption-Fused-LoRA).
See the full model card at **[E27085921/HIKARI-Rigel-8B-SkinCaption](https://huggingface.co/E27085921/HIKARI-Rigel-8B-SkinCaption)** for complete details, usage examples, and performance comparison.
---
## Usage
```python
from peft import PeftModel
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
import torch
from PIL import Image
# Step 1: Load base model (Qwen3-VL-8B-Thinking, ~17 GB)
base = Qwen3VLForConditionalGeneration.from_pretrained(
"Qwen/Qwen3-VL-8B-Thinking",
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
# Step 2: Apply LoRA adapter (~1.1 GB)
model = PeftModel.from_pretrained(base, "E27085921/HIKARI-Rigel-8B-SkinCaption-LoRA")
processor = AutoProcessor.from_pretrained("E27085921/HIKARI-Rigel-8B-SkinCaption-LoRA", trust_remote_code=True)
# Step 3: Inference โ see full examples at E27085921/HIKARI-Rigel-8B-SkinCaption
image = Image.open("skin_lesion.jpg").convert("RGB")
```
For complete inference examples including vLLM and SGLang production code, see:
**[E27085921/HIKARI-Rigel-8B-SkinCaption](https://huggingface.co/E27085921/HIKARI-Rigel-8B-SkinCaption)**
---
## ๐ Citation
```bibtex
@misc{hikari2026,
title = {HIKARI: RAG-in-Training for Skin Disease Diagnosis
with Cascaded Vision-Language Models},
author = {Watin Promfiy and Pawitra Boonprasart},
year = {2026},
institution = {King Mongkut's Institute of Technology Ladkrabang,
Department of Information Technology, Bangkok, Thailand}
}
```
<p align="center">Made with โค๏ธ at <b>King Mongkut's Institute of Technology Ladkrabang (KMITL)</b></p>
|