kepeng
/

PRSIMVL-LoRA-V1

Visual Question Answering

vision-language

measurement-grounding

Model card Files Files and versions

PRSIMVL-LoRA-V1 / README.md

kepeng's picture

Create README.md

217866a verified 6 days ago

|

history blame contribute delete

1.26 kB

	---
	license: cc-by-nc-4.0
	task_categories:
	- visual-question-answering
	- image-to-text
	language:
	- en
	tags:
	- vision-language
	- multimodal
	- visual-question-answering
	- measurement-grounding
	- raw-image
	- camera-raw
	- meas-xyz
	- low-light
	- hdr
	- benchmark
	- prsimvl
	pretty_name: kepeng/PRSIMVL-LoRA-V1
	---

	# Released PRSIMVL Weights

	This folder stores the released PRSIMVL LoRA adapter layout used by inference and evaluation.


	## Expected Checkpoints

	\| Size \| Base Model \| Local LoRA Checkpoint \|
	\|---\|---\|---\|
	\| 2B \| `Qwen/Qwen3-VL-2B-Instruct` \| `BANALCED_150K_META_VIT_PROXY/output-Qwen3-VL-2B-Instruct/v8-20260421-133546/checkpoint-95000` \|
	\| 4B \| `Qwen/Qwen3-VL-4B-Instruct` \| `BANALCED_150K_META_VIT_PROXY/output-Qwen3-VL-4B-Instruct/v12-20260425-113029/checkpoint-85000` \|
	\| 8B \| `Qwen/Qwen3-VL-8B-Instruct` \| `BANALCED_150K_META_VIT_PROXY/output-Qwen3-VL-8B-Instruct/v2-20260423-205317/checkpoint-95000` \|


	```
	@misc{xu2026allegory,
	title = {Allegory of the Cave: Measurement-Grounded Vision-Language Learning},
	author = {Xu, Kepeng and Xu, Li and He, Gang and Yu, Wenxin},
	year = {2026},
	eprint = {2605.11727},
	archivePrefix = {arXiv},
	url = {https://arxiv.org/abs/2605.11727}
	}
	```