PRSIMVL-LoRA-V1 / README.md

kepeng

Create README.md

217866a verified 4 days ago

preview code

raw

history blame contribute delete

1.26 kB

metadata

license: cc-by-nc-4.0
task_categories:
  - visual-question-answering
  - image-to-text
language:
  - en
tags:
  - vision-language
  - multimodal
  - visual-question-answering
  - measurement-grounding
  - raw-image
  - camera-raw
  - meas-xyz
  - low-light
  - hdr
  - benchmark
  - prsimvl
pretty_name: kepeng/PRSIMVL-LoRA-V1

Released PRSIMVL Weights

This folder stores the released PRSIMVL LoRA adapter layout used by inference and evaluation.

Expected Checkpoints

Size	Base Model	Local LoRA Checkpoint
2B	`Qwen/Qwen3-VL-2B-Instruct`	`BANALCED_150K_META_VIT_PROXY/output-Qwen3-VL-2B-Instruct/v8-20260421-133546/checkpoint-95000`
4B	`Qwen/Qwen3-VL-4B-Instruct`	`BANALCED_150K_META_VIT_PROXY/output-Qwen3-VL-4B-Instruct/v12-20260425-113029/checkpoint-85000`
8B	`Qwen/Qwen3-VL-8B-Instruct`	`BANALCED_150K_META_VIT_PROXY/output-Qwen3-VL-8B-Instruct/v2-20260423-205317/checkpoint-95000`

@misc{xu2026allegory,
  title         = {Allegory of the Cave: Measurement-Grounded Vision-Language Learning},
  author        = {Xu, Kepeng and Xu, Li and He, Gang and Yu, Wenxin},
  year          = {2026},
  eprint        = {2605.11727},
  archivePrefix = {arXiv},
  url           = {https://arxiv.org/abs/2605.11727}
}