|
|
--- |
|
|
base_model: colqwen2.5-base |
|
|
library_name: peft |
|
|
--- |
|
|
|
|
|
# RegionRet |
|
|
|
|
|
RegionRet is a LoRA adapter model for region-level vision-language retrieval, fine-tuned from ColQwen2.5-Base using Parameter-Efficient Fine-Tuning (PEFT). |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Model Type:** LoRA Adapter (PEFT) |
|
|
- **Base Model:** ColQwen2.5-Base |
|
|
- **Task Type:** Feature Extraction |
|
|
- **Framework:** PEFT 0.14.0 |
|
|
|
|
|
### LoRA Configuration |
|
|
|
|
|
- **Rank (r):** 32 |
|
|
- **LoRA Alpha:** 32 |
|
|
- **LoRA Dropout:** 0.1 |
|
|
- **Target Modules:** MLP projections (down_proj, gate_proj, up_proj) and attention projections (k_proj, q_proj, v_proj, o_proj), plus custom_text_proj |
|
|
|
|
|
### Model Architecture |
|
|
|
|
|
- **Processor:** ColQwen2_5_Processor |
|
|
- **Max Visual Tokens:** 1536 |
|
|
- **Attention:** Flash Attention 2 |
|
|
- **Precision:** bfloat16 |
|
|
|
|
|
## Uses |
|
|
|
|
|
Please refer to [https://github.com/Aeryn666/RegionRAG](https://github.com/Aeryn666/RegionRAG). |
|
|
|
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
|
|
|
- VisRAG-Ret-Train-In-domain-data |
|
|
- Visual-CoT (DocVQA, TextCap, TextVQA, InfographicsVQA) |
|
|
|
|
|
### Training Configuration |
|
|
|
|
|
- **Loss Function:** RegionContraLoss (global_tau=0.02, local_tau=0.25, local_coef=0.01) |
|
|
- **Epochs:** 5 |
|
|
- **Batch Size:** 80 per device |
|
|
- **Learning Rate:** 2e-4 |
|
|
- **Precision:** bfloat16 |
|
|
- **Gradient Checkpointing:** Enabled |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Requires ColQwen2.5-Base base model to function |
|
|
- Optimized for region-level vision-language retrieval tasks |
|
|
- GPU with bfloat16 and Flash Attention 2 support recommended |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{li2025regionragregionlevelretrievalaugmentedgeneration, |
|
|
title={RegionRAG: Region-level Retrieval-Augmented Generation for Visual Document Understanding}, |
|
|
author={Yinglu Li and Zhiying Lu and Zhihang Liu and Yiwei Sun and Chuanbin Liu and Hongtao Xie}, |
|
|
year={2025}, |
|
|
eprint={2510.27261}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CV}, |
|
|
url={https://arxiv.org/abs/2510.27261}, |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
Please refer to the license of the base model ColQwen2.5. |
|
|
|