HIKARI β€” Healthcare-oriented Intelligent Knowledge Augmented Retrieval and Inference

HIKARI-Rigel-8B-SkinCaption

Healthcare-oriented Intelligent Knowledge Augmented Retrieval and Inference
Named after Rigel β€” blue supergiant in Orion, a first step in the caption training path


πŸ“¦ Model Type: Merged Full Model

This is a fully merged model β€” the LoRA adapter weights have been merged directly into the base model weights.

βœ… No adapter loading needed. Load directly with transformers, vLLM, or SGLang.

πŸ’Ύ Size: ~17 GB (4 safetensor shards)

πŸ”Œ Lightweight adapter version: E27085921/HIKARI-Rigel-8B-SkinCaption-LoRA (~1.1 GB)


Overview

HIKARI-Rigel generates clinical skin lesion captions using the checkpoint-init (Way 1) training strategy: Stage 3 caption training continues directly from the Stage 2 LoRA checkpoint, fine-tuning the existing disease adapters further on caption data.

This is an ablation baseline. For best captioning performance, use ⭐ HIKARI-Vega-8B-SkinCaption-Fused (BLEU-4: 29.33, 3Γ— better).

Property Value
Task Clinical skin lesion caption generation (Stage 3)
Base model Qwen/Qwen3-VL-8B-Thinking
Init strategy Checkpoint-Init β€” continues from Stage 2 LoRA checkpoint
BLEU-4 9.82
ROUGE-1 38.90
BERTScore-F 88.12 (roberta-large)
Model type Merged full model

Why Checkpoint-Init Underperforms

The Stage 2 disease LoRA adapters are directly continued into caption training. The caption learning signal overwrites the disease knowledge that was stored in those same LoRA weights. Result: the model loses its diagnostic ability before it fully learns to generate captions.

Init BLEU-4 ROUGE-1 Disease knowledge
Checkpoint (this model) 9.82 38.90 ❌ Lost during training
Merged (Vega) 29.33 53.55 βœ… Locked in base weights

πŸ”§ Quick Inference β€” transformers

from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
import torch
from PIL import Image

model_id = "E27085921/HIKARI-Rigel-8B-SkinCaption"

processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = Qwen3VLForConditionalGeneration.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True
)

image = Image.open("skin_lesion.jpg").convert("RGB")

PROMPT = (
    "Describe this skin lesion image in detail. Include information about its "
    "appearance, possible diagnosis, and recommended examinations."
)

messages = [{"role": "user", "content": [
    {"type": "image", "image": image},
    {"type": "text", "text": PROMPT},
]}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)

with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=256, temperature=0.0, do_sample=False)

print(processor.batch_decode(out[:, inputs["input_ids"].shape[1]:], skip_special_tokens=True)[0].strip())

πŸ”Œ LoRA Adapter Version

from peft import PeftModel
from transformers import Qwen3VLForConditionalGeneration
import torch

base = Qwen3VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen3-VL-8B-Thinking", torch_dtype=torch.bfloat16, device_map="auto"
)
model = PeftModel.from_pretrained(base, "E27085921/HIKARI-Rigel-8B-SkinCaption-LoRA")

β†’ E27085921/HIKARI-Rigel-8B-SkinCaption-LoRA


πŸ“„ Citation

@misc{hikari2026,
  title  = {HIKARI: RAG-in-Training for Skin Disease Diagnosis
            with Cascaded Vision-Language Models},
  author = {Watin Promfiy and Pawitra Boonprasart},
  year   = {2026},
  institution = {King Mongkut's Institute of Technology Ladkrabang,
                 Department of Information Technology, Bangkok, Thailand}
}

Made with ❀️ at King Mongkut's Institute of Technology Ladkrabang (KMITL)

Downloads last month
13
Safetensors
Model size
9B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for E27085921/HIKARI-Rigel-8B-SkinCaption

Finetuned
(43)
this model
Quantizations
1 model

Collection including E27085921/HIKARI-Rigel-8B-SkinCaption