|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- meta-llama/Llama-3.1-8B-Instruct |
|
|
- google/siglip-large-patch16-384 |
|
|
pipeline_tag: visual-question-answering |
|
|
--- |
|
|
|
|
|
# Falcon-8B |
|
|
|
|
|
## Description |
|
|
|
|
|
\[[Paper](https://arxiv.org/abs/2501.16297)\] \[[GitHub](https://github.com/JiuTian-VL/FALCON)\] \[[Project Page](https://jiutian-vl.github.io/FALCON.github.io/)\] |
|
|
|
|
|
This is the official model weights of *FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers*. In this work, we propose the FALCON model, which introduces a novel visual register technique to simultaneously address the issues of visual redundancy and fragmentation in the high-resolution visual encoding of MLLMs. |
|
|
|
|
|
 |
|
|
|
|
|
## How to Run? |
|
|
|
|
|
Please refer to the instructions in the [Githhub repository](https://github.com/JiuTian-VL/FALCON). |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you find this work useful for your research, please kindly cite our paper: |
|
|
|
|
|
```BibTeX |
|
|
@InProceedings{zhang2025falcon, |
|
|
author={Zhang, Renshan and Shao, Rui and Chen, Gongwei and Zhang, Miao and Zhou, Kaiwen and Guan, Weili and Nie, Liqiang}, |
|
|
title={FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers}, |
|
|
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, |
|
|
month= {October}, |
|
|
year={2025}, |
|
|
} |
|
|
``` |