--- license: apache-2.0 language: - en base_model: - meta-llama/Llama-3.1-8B-Instruct - google/siglip-large-patch16-384 pipeline_tag: visual-question-answering --- # Falcon-8B ## Description \[[Paper](https://arxiv.org/abs/2501.16297)\] \[[GitHub](https://github.com/JiuTian-VL/FALCON)\] \[[Project Page](https://jiutian-vl.github.io/FALCON.github.io/)\] This is the official model weights of *FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers*. In this work, we propose the FALCON model, which introduces a novel visual register technique to simultaneously address the issues of visual redundancy and fragmentation in the high-resolution visual encoding of MLLMs. ![image/png](https://jiutian-vl.github.io/FALCON.github.io/assets/images/FALCON_arch.png) ## How to Run? Please refer to the instructions in the [Githhub repository](https://github.com/JiuTian-VL/FALCON). ## Citation If you find this work useful for your research, please kindly cite our paper: ```BibTeX @InProceedings{zhang2025falcon, author={Zhang, Renshan and Shao, Rui and Chen, Gongwei and Zhang, Miao and Zhou, Kaiwen and Guan, Weili and Nie, Liqiang}, title={FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers}, booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month= {October}, year={2025}, } ```