| | --- |
| | license: mit |
| | pipeline_tag: image-text-to-text |
| | library_name: transformers |
| | base_model: |
| | - internlm/internlm2-chat-1_8b |
| | base_model_relation: merge |
| | language: |
| | - multilingual |
| | tags: |
| | - internvl |
| | - vision |
| | - ocr |
| | - custom_code |
| | - moe |
| | --- |
| | |
| | # Mono-InternVL-2B-S1-1 |
| |
|
| | This repository contains the Mono-InternVL-2B model after **S1.1 concept learning**. |
| |
|
| | Please refer to our [**paper**](https://huggingface.co/papers/2410.08202), [**project page**](https://internvl.github.io/blog/2024-10-10-Mono-InternVL/) and [**GitHub repository**](https://github.com/OpenGVLab/mono-internvl) for introduction and usage. |
| |
|
| |
|
| |
|
| | ## Citation |
| |
|
| | If you find this project useful in your research, please consider citing: |
| |
|
| | ```BibTeX |
| | @article{luo2024mono, |
| | title={Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training}, |
| | author={Luo, Gen and Yang, Xue and Dou, Wenhan and Wang, Zhaokai and Liu, Jiawen and Dai, Jifeng and Qiao, Yu and Zhu, Xizhou}, |
| | journal={arXiv preprint arXiv:2410.08202}, |
| | year={2024} |
| | } |
| | ``` |
| |
|
| |
|