Aeryn666
/

RegionRet

Model card Files Files and versions

RegionRet / README.md

Aeryn666's picture

Upload folder using huggingface_hub

c737ebe verified 16 days ago

|

history blame contribute delete

2.02 kB

	---
	base_model: colqwen2.5-base
	library_name: peft
	---

	# RegionRet

	RegionRet is a LoRA adapter model for region-level vision-language retrieval, fine-tuned from ColQwen2.5-Base using Parameter-Efficient Fine-Tuning (PEFT).

	## Model Details

	- Model Type: LoRA Adapter (PEFT)
	- Base Model: ColQwen2.5-Base
	- Task Type: Feature Extraction
	- Framework: PEFT 0.14.0

	### LoRA Configuration

	- Rank (r): 32
	- LoRA Alpha: 32
	- LoRA Dropout: 0.1
	- Target Modules: MLP projections (down_proj, gate_proj, up_proj) and attention projections (k_proj, q_proj, v_proj, o_proj), plus custom_text_proj

	### Model Architecture

	- Processor: ColQwen2_5_Processor
	- Max Visual Tokens: 1536
	- Attention: Flash Attention 2
	- Precision: bfloat16

	## Uses

	Please refer to [https://github.com/Aeryn666/RegionRAG](https://github.com/Aeryn666/RegionRAG).


	## Training Details

	### Training Data

	- VisRAG-Ret-Train-In-domain-data
	- Visual-CoT (DocVQA, TextCap, TextVQA, InfographicsVQA)

	### Training Configuration

	- Loss Function: RegionContraLoss (global_tau=0.02, local_tau=0.25, local_coef=0.01)
	- Epochs: 5
	- Batch Size: 80 per device
	- Learning Rate: 2e-4
	- Precision: bfloat16
	- Gradient Checkpointing: Enabled

	## Limitations

	- Requires ColQwen2.5-Base base model to function
	- Optimized for region-level vision-language retrieval tasks
	- GPU with bfloat16 and Flash Attention 2 support recommended

	## Citation

	If you use this model, please cite:

	```bibtex
	@misc{li2025regionragregionlevelretrievalaugmentedgeneration,
	title={RegionRAG: Region-level Retrieval-Augmented Generation for Visual Document Understanding},
	author={Yinglu Li and Zhiying Lu and Zhihang Liu and Yiwei Sun and Chuanbin Liu and Hongtao Xie},
	year={2025},
	eprint={2510.27261},
	archivePrefix={arXiv},
	primaryClass={cs.CV},
	url={https://arxiv.org/abs/2510.27261},
	}
	```

	## License

	Please refer to the license of the base model ColQwen2.5.