E27085921's picture
Upload HIKARI-Rigel-8B-SkinCaption
89e88ee verified
metadata
license: apache-2.0
base_model: Qwen/Qwen3-VL-8B-Thinking
tags:
  - dermatology
  - medical
  - vision-language
  - caption-generation
  - clinical-nlp
  - fine-tuned
  - qwen3-vl
language:
  - en
  - th
datasets:
  - SkinCAP
pipeline_tag: image-text-to-text

HIKARI β€” Healthcare-oriented Intelligent Knowledge Augmented Retrieval and Inference

HIKARI-Rigel-8B-SkinCaption

Healthcare-oriented Intelligent Knowledge Augmented Retrieval and Inference
Named after Rigel β€” blue supergiant in Orion, a first step in the caption training path


πŸ“¦ Model Type: Merged Full Model

This is a fully merged model β€” the LoRA adapter weights have been merged directly into the base model weights.

βœ… No adapter loading needed. Load directly with transformers, vLLM, or SGLang.

πŸ’Ύ Size: ~17 GB (4 safetensor shards)

πŸ”Œ Lightweight adapter version: E27085921/HIKARI-Rigel-8B-SkinCaption-LoRA (~1.1 GB)


Overview

HIKARI-Rigel generates clinical skin lesion captions using the checkpoint-init (Way 1) training strategy: Stage 3 caption training continues directly from the Stage 2 LoRA checkpoint, fine-tuning the existing disease adapters further on caption data.

This is an ablation baseline. For best captioning performance, use ⭐ HIKARI-Vega-8B-SkinCaption-Fused (BLEU-4: 29.33, 3Γ— better).

Property Value
Task Clinical skin lesion caption generation (Stage 3)
Base model Qwen/Qwen3-VL-8B-Thinking
Init strategy Checkpoint-Init β€” continues from Stage 2 LoRA checkpoint
BLEU-4 9.82
ROUGE-1 38.90
BERTScore-F 88.12 (roberta-large)
Model type Merged full model

Why Checkpoint-Init Underperforms

The Stage 2 disease LoRA adapters are directly continued into caption training. The caption learning signal overwrites the disease knowledge that was stored in those same LoRA weights. Result: the model loses its diagnostic ability before it fully learns to generate captions.

Init BLEU-4 ROUGE-1 Disease knowledge
Checkpoint (this model) 9.82 38.90 ❌ Lost during training
Merged (Vega) 29.33 53.55 βœ… Locked in base weights

πŸ”§ Quick Inference β€” transformers

from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
import torch
from PIL import Image

model_id = "E27085921/HIKARI-Rigel-8B-SkinCaption"

processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = Qwen3VLForConditionalGeneration.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True
)

image = Image.open("skin_lesion.jpg").convert("RGB")

PROMPT = (
    "Describe this skin lesion image in detail. Include information about its "
    "appearance, possible diagnosis, and recommended examinations."
)

messages = [{"role": "user", "content": [
    {"type": "image", "image": image},
    {"type": "text", "text": PROMPT},
]}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)

with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=256, temperature=0.0, do_sample=False)

print(processor.batch_decode(out[:, inputs["input_ids"].shape[1]:], skip_special_tokens=True)[0].strip())

πŸ”Œ LoRA Adapter Version

from peft import PeftModel
from transformers import Qwen3VLForConditionalGeneration
import torch

base = Qwen3VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen3-VL-8B-Thinking", torch_dtype=torch.bfloat16, device_map="auto"
)
model = PeftModel.from_pretrained(base, "E27085921/HIKARI-Rigel-8B-SkinCaption-LoRA")

β†’ E27085921/HIKARI-Rigel-8B-SkinCaption-LoRA


πŸ“„ Citation

@misc{hikari2026,
  title  = {HIKARI: RAG-in-Training for Skin Disease Diagnosis
            with Cascaded Vision-Language Models},
  author = {Watin Promfiy and Pawitra Boonprasart},
  year   = {2026},
  institution = {King Mongkut's Institute of Technology Ladkrabang,
                 Department of Information Technology, Bangkok, Thailand}
}

Made with ❀️ at King Mongkut's Institute of Technology Ladkrabang (KMITL)