File size: 3,370 Bytes

ab5ba52

---
license: apache-2.0
base_model: Qwen/Qwen3-VL-8B-Thinking
tags:
  - dermatology
  - medical
  - lora
  - peft
  - skin-disease
  - qwen3-vl
language:
  - en
  - th
pipeline_tag: image-text-to-text
---

<p align="center">
  <img src="HIKARI_logo.png" alt="HIKARI" width="100%"/>
</p>

<h1 align="center">HIKARI-Rigel-8B-SkinCaption-LoRA</h1>

<p align="center">
  <img src="https://img.shields.io/badge/Type-LoRA%20Adapter-blueviolet?style=flat-square"/>
  <img src="https://img.shields.io/badge/Size-~1.1%20GB-lightblue?style=flat-square"/>
  <img src="https://img.shields.io/badge/Base-Qwen3--VL--8B--Thinking-blue?style=flat-square"/>
  <img src="https://img.shields.io/badge/License-Apache%202.0-orange?style=flat-square"/>
</p>

---

## 🔌 Model Type: LoRA Adapter

> This is a **LoRA adapter** (~1.1 GB) — it must be loaded **on top of** the base model `Qwen/Qwen3-VL-8B-Thinking`.
>
> ✅ **Advantage:** Lightweight — download only ~1.1 GB instead of ~17 GB.
>
> ⚠️ **Requirement:** You must separately load `Qwen/Qwen3-VL-8B-Thinking` (base model, ~17 GB) first.
>
> 💾 If you prefer a standalone ready-to-use model, see the merged version:
> **[E27085921/HIKARI-Rigel-8B-SkinCaption](https://huggingface.co/E27085921/HIKARI-Rigel-8B-SkinCaption)** (~17 GB)

---

## What is this adapter?

LoRA adapter for **[HIKARI-Rigel-8B-SkinCaption](https://huggingface.co/E27085921/HIKARI-Rigel-8B-SkinCaption)** — Clinical skin lesion caption generation (checkpoint-init, ablation baseline). Metric: **BLEU-4: 9.82**.

This is the ablation baseline adapter. For the best caption model, see [HIKARI-Vega-8B-SkinCaption-Fused-LoRA](https://huggingface.co/E27085921/HIKARI-Vega-8B-SkinCaption-Fused-LoRA).

See the full model card at **[E27085921/HIKARI-Rigel-8B-SkinCaption](https://huggingface.co/E27085921/HIKARI-Rigel-8B-SkinCaption)** for complete details, usage examples, and performance comparison.

---

## Usage

```python
from peft import PeftModel
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
import torch
from PIL import Image

# Step 1: Load base model (Qwen3-VL-8B-Thinking, ~17 GB)
base = Qwen3VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen3-VL-8B-Thinking",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

# Step 2: Apply LoRA adapter (~1.1 GB)
model = PeftModel.from_pretrained(base, "E27085921/HIKARI-Rigel-8B-SkinCaption-LoRA")
processor = AutoProcessor.from_pretrained("E27085921/HIKARI-Rigel-8B-SkinCaption-LoRA", trust_remote_code=True)

# Step 3: Inference — see full examples at E27085921/HIKARI-Rigel-8B-SkinCaption
image = Image.open("skin_lesion.jpg").convert("RGB")
```

For complete inference examples including vLLM and SGLang production code, see:
**[E27085921/HIKARI-Rigel-8B-SkinCaption](https://huggingface.co/E27085921/HIKARI-Rigel-8B-SkinCaption)**

---

## 📄 Citation

```bibtex
@misc{hikari2026,
  title  = {HIKARI: RAG-in-Training for Skin Disease Diagnosis
            with Cascaded Vision-Language Models},
  author = {Watin Promfiy and Pawitra Boonprasart},
  year   = {2026},
  institution = {King Mongkut's Institute of Technology Ladkrabang,
                 Department of Information Technology, Bangkok, Thailand}
}
```

<p align="center">Made with ❤️ at <b>King Mongkut's Institute of Technology Ladkrabang (KMITL)</b></p>