---
license: apache-2.0
language:
- en
base_model:
- meta-llama/Llama-3.1-8B-Instruct
- google/siglip-large-patch16-384
pipeline_tag: visual-question-answering
---

# Falcon-8B

## Description

\[[Paper](https://arxiv.org/abs/2501.16297)\]  \[[GitHub](https://github.com/JiuTian-VL/FALCON)\]   \[[Project Page](https://jiutian-vl.github.io/FALCON.github.io/)\]

This is the official model weights of *FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers*. In this work, we propose the FALCON model, which introduces a novel visual register technique to simultaneously address the issues of visual redundancy and fragmentation in the high-resolution visual encoding of MLLMs.

![image/png](https://jiutian-vl.github.io/FALCON.github.io/assets/images/FALCON_arch.png)

## How to Run?

Please refer to the instructions in the [Githhub repository](https://github.com/JiuTian-VL/FALCON).

## Citation

If you find this work useful for your research, please kindly cite our paper:

```BibTeX
@InProceedings{zhang2025falcon,
    author={Zhang, Renshan and Shao, Rui and Chen, Gongwei and Zhang, Miao and Zhou, Kaiwen and Guan, Weili and Nie, Liqiang},
    title={FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers},
    booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month= {October},
    year={2025},
}
```