File size: 3,370 Bytes
ab5ba52
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
---
license: apache-2.0
base_model: Qwen/Qwen3-VL-8B-Thinking
tags:
  - dermatology
  - medical
  - lora
  - peft
  - skin-disease
  - qwen3-vl
language:
  - en
  - th
pipeline_tag: image-text-to-text
---

<p align="center">
  <img src="HIKARI_logo.png" alt="HIKARI" width="100%"/>
</p>

<h1 align="center">HIKARI-Rigel-8B-SkinCaption-LoRA</h1>

<p align="center">
  <img src="https://img.shields.io/badge/Type-LoRA%20Adapter-blueviolet?style=flat-square"/>
  <img src="https://img.shields.io/badge/Size-~1.1%20GB-lightblue?style=flat-square"/>
  <img src="https://img.shields.io/badge/Base-Qwen3--VL--8B--Thinking-blue?style=flat-square"/>
  <img src="https://img.shields.io/badge/License-Apache%202.0-orange?style=flat-square"/>
</p>

---

## ๐Ÿ”Œ Model Type: LoRA Adapter

> This is a **LoRA adapter** (~1.1 GB) โ€” it must be loaded **on top of** the base model `Qwen/Qwen3-VL-8B-Thinking`.
>
> โœ… **Advantage:** Lightweight โ€” download only ~1.1 GB instead of ~17 GB.
>
> โš ๏ธ **Requirement:** You must separately load `Qwen/Qwen3-VL-8B-Thinking` (base model, ~17 GB) first.
>
> ๐Ÿ’พ If you prefer a standalone ready-to-use model, see the merged version:
> **[E27085921/HIKARI-Rigel-8B-SkinCaption](https://huggingface.co/E27085921/HIKARI-Rigel-8B-SkinCaption)** (~17 GB)

---

## What is this adapter?

LoRA adapter for **[HIKARI-Rigel-8B-SkinCaption](https://huggingface.co/E27085921/HIKARI-Rigel-8B-SkinCaption)** โ€” Clinical skin lesion caption generation (checkpoint-init, ablation baseline). Metric: **BLEU-4: 9.82**.

This is the ablation baseline adapter. For the best caption model, see [HIKARI-Vega-8B-SkinCaption-Fused-LoRA](https://huggingface.co/E27085921/HIKARI-Vega-8B-SkinCaption-Fused-LoRA).

See the full model card at **[E27085921/HIKARI-Rigel-8B-SkinCaption](https://huggingface.co/E27085921/HIKARI-Rigel-8B-SkinCaption)** for complete details, usage examples, and performance comparison.

---

## Usage

```python
from peft import PeftModel
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
import torch
from PIL import Image

# Step 1: Load base model (Qwen3-VL-8B-Thinking, ~17 GB)
base = Qwen3VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen3-VL-8B-Thinking",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

# Step 2: Apply LoRA adapter (~1.1 GB)
model = PeftModel.from_pretrained(base, "E27085921/HIKARI-Rigel-8B-SkinCaption-LoRA")
processor = AutoProcessor.from_pretrained("E27085921/HIKARI-Rigel-8B-SkinCaption-LoRA", trust_remote_code=True)

# Step 3: Inference โ€” see full examples at E27085921/HIKARI-Rigel-8B-SkinCaption
image = Image.open("skin_lesion.jpg").convert("RGB")
```

For complete inference examples including vLLM and SGLang production code, see:
**[E27085921/HIKARI-Rigel-8B-SkinCaption](https://huggingface.co/E27085921/HIKARI-Rigel-8B-SkinCaption)**

---

## ๐Ÿ“„ Citation

```bibtex
@misc{hikari2026,
  title  = {HIKARI: RAG-in-Training for Skin Disease Diagnosis
            with Cascaded Vision-Language Models},
  author = {Watin Promfiy and Pawitra Boonprasart},
  year   = {2026},
  institution = {King Mongkut's Institute of Technology Ladkrabang,
                 Department of Information Technology, Bangkok, Thailand}
}
```

<p align="center">Made with โค๏ธ at <b>King Mongkut's Institute of Technology Ladkrabang (KMITL)</b></p>