Falcon-8B / README.md
renns's picture
Update README.md
87cca4d verified
metadata
license: apache-2.0
language:
  - en
base_model:
  - meta-llama/Llama-3.1-8B-Instruct
  - google/siglip-large-patch16-384
pipeline_tag: visual-question-answering

Falcon-8B

Description

[Paper] [GitHub] [Project Page]

This is the official model weights of FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers. In this work, we propose the FALCON model, which introduces a novel visual register technique to simultaneously address the issues of visual redundancy and fragmentation in the high-resolution visual encoding of MLLMs.

image/png

How to Run?

Please refer to the instructions in the Githhub repository.

Citation

If you find this work useful for your research, please kindly cite our paper:

@InProceedings{zhang2025falcon,
    author={Zhang, Renshan and Shao, Rui and Chen, Gongwei and Zhang, Miao and Zhou, Kaiwen and Guan, Weili and Nie, Liqiang},
    title={FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers},
    booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month= {October},
    year={2025},
}