File size: 1,499 Bytes
5eacf1f 87cca4d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
---
license: apache-2.0
language:
- en
base_model:
- meta-llama/Llama-3.1-8B-Instruct
- google/siglip-large-patch16-384
pipeline_tag: visual-question-answering
---
# Falcon-8B
## Description
\[[Paper](https://arxiv.org/abs/2501.16297)\] \[[GitHub](https://github.com/JiuTian-VL/FALCON)\] \[[Project Page](https://jiutian-vl.github.io/FALCON.github.io/)\]
This is the official model weights of *FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers*. In this work, we propose the FALCON model, which introduces a novel visual register technique to simultaneously address the issues of visual redundancy and fragmentation in the high-resolution visual encoding of MLLMs.

## How to Run?
Please refer to the instructions in the [Githhub repository](https://github.com/JiuTian-VL/FALCON).
## Citation
If you find this work useful for your research, please kindly cite our paper:
```BibTeX
@InProceedings{zhang2025falcon,
author={Zhang, Renshan and Shao, Rui and Chen, Gongwei and Zhang, Miao and Zhou, Kaiwen and Guan, Weili and Nie, Liqiang},
title={FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month= {October},
year={2025},
}
``` |