---
library_name: pytorch
---

![InternVL3.5logo](resource/InternVL3.5-1B.png)

InternVL3.5 is a family of multimodal vision–language models developed to tightly integrate visual encoders with large language models for unified image–text understanding. The series focuses on strong multimodal reasoning, scalable model sizes, and efficient deployment across both cloud and edge AI scenarios.

- **Original paper:**  https://arxiv.org/abs/2508.18265

# InternVL3.5-1B

InternVL3.5-1B is a compact variant designed for efficient multimodal reasoning with a small parameter footprint. It balances performance and latency, making it suitable for edge deployments such as visual assistants, document/image understanding, and edge vision-language AI systems.

Model Configuration:
- Reference implementation: [InternVL](https://github.com/OpenGVLab/InternVL)
- Original Weight: [InternVL3.5-1B](https://huggingface.co/OpenGVLab/InternVL3_5-1B)
- Resolution: 3x448x448
- Support Cooper version:
    - Cooper SDK: [2.5.3]
    - Cooper Foundry: [2.2]

| Model | Device | Model Link |
| :-----: | :-----: | :-----: |
| InternVL3.5-1B | CV7 | [Model_Link](https://huggingface.co/Ambarella/InternVL3.5/blob/main/cv7_internvl_3.5_1B.tar.gz) |
| InternVL3.5-1B | CV72 | [Model_Link](https://huggingface.co/Ambarella/InternVL3.5/blob/main/cv72_internvl_3.5_1B.tar.gz) |
| InternVL3.5-1B | CV75 | [Model_Link](https://huggingface.co/Ambarella/InternVL3.5/blob/main/cv75_internvl_3.5_1B.tar.gz) |

# InternVL3.5-2B
InternVL3.5-2B increases model capacity to improve multimodal reasoning, contextual understanding, and generation quality while maintaining relatively efficient inference. It is well suited for production multimodal assistants, image-grounded dialogue, and enterprise vision-language applications.

Model Configuration:
- Reference implementation: [InternVL](https://github.com/OpenGVLab/InternVL)
- Original Weight: [InternVL3.5-2B](https://huggingface.co/OpenGVLab/InternVL3_5-2B)
- Resolution: 3x448x448
- Support Cooper version:
    - Cooper SDK: [2.5.3]
    - Cooper Foundry: [2.2]

| Model | Device | Model Link |
| :-----: | :-----: | :-----: |
| InternVL3.5-2B | N1-655 | [Model_Link](https://huggingface.co/Ambarella/InternVL3.5/blob/main/n1-655_internvl_3.5_2B.tar.gz) |
| InternVL3.5-2B | CV7 | [Model_Link](https://huggingface.co/Ambarella/InternVL3.5/blob/main/cv7_internvl_3.5_2B.tar.gz) |
| InternVL3.5-2B | CV72 | [Model_Link](https://huggingface.co/Ambarella/InternVL3.5/blob/main/cv72_internvl_3.5_2B.tar.gz) |

# InternVL3.5-4B
InternVL3.5-4B is a higher-capacity variant focused on stronger reasoning, richer visual understanding, and more robust multimodal generation. It is appropriate for advanced multimodal tasks such as complex visual question answering, document/image intelligence, and high-accuracy vision-language analytics.
Model Configuration:
- Reference implementation: [InternVL](https://github.com/OpenGVLab/InternVL)
- Original Weight: [InternVL3.5-4B](https://huggingface.co/OpenGVLab/InternVL3_5-4B)
- Resolution: 3x448x448
- Support Cooper version:
    - Cooper SDK: [2.5.3]
    - Cooper Foundry: [2.2]

| Model | Device | Model Link |
| :-----: | :-----: | :-----: |
| InternVL3.5-4B | N1-655 | [Model_Link](https://huggingface.co/Ambarella/InternVL3.5/blob/main/n1-655_internvl_3.5_4B.tar.gz) |
| InternVL3.5-4B | CV7 | [Model_Link](https://huggingface.co/Ambarella/InternVL3.5/blob/main/cv7_internvl_3.5_4B.tar.gz) |
| InternVL3.5-4B | CV72 | [Model_Link](https://huggingface.co/Ambarella/InternVL3.5/blob/main/cv72_internvl_3.5_4B.tar.gz) |