YannQi
/

R-4B

@@ -19,13 +19,14 @@ pipeline_tag: visual-question-answering
   <img src="asset/R-4B.png" width="100%" alt="R-4B Performance">
 </div>
 ## ⭐️ Introduction
-In this report, we present **R-4B**,  a multimodal large language model designed to achieve adaptive multimodal reasoning—dynamically choosing between step-by-step thinking and direct response generation based on task complexity. This capability enables R-4B to deliver high-quality responses while significantly improving inference efficiency and reducing computational costs.
 The development of R-4B follows a two-stage training paradigm:
-(1) Dual-Capability Pretraining, which establishes both thinking and non-thinking capabilities for VQA; and
-(2) Adaptive Thinking Post-Training, which enables the model to adaptively switch between modes based on input demands.
 R-4B achieves state-of-the-art performance among models of its scale. In evaluations across multiple public benchmarks, R-4B outperforms Qwen2.5-VL-7B on nearly all tasks. Notably, it matches or exceeds the performance of the much larger Kimi-VL-Thinking-2506 (3B activated, 16B total parameters).
@@ -42,11 +43,10 @@ Below, we provide simple examples to show how to use R-4B with 🤗 Transformers
 import requests
 import torch
 from transformers import AutoModel, AutoProcessor
 model_path = "YannQi/R-4B"
-from PIL import Image
 model = AutoModel.from_pretrained(
     model_path,
     torch_dtype=torch.float16,
@@ -109,10 +109,10 @@ print("Auto Thinking Output:", output_text_auto_thinking)
 #### Install
-The code of R-4B requires custom vllm. Please install from local source:
 ```bash
-git clone https://github.com/yannqi/vllm.git
 cd vllm
 VLLM_USE_PRECOMPILED=1 uv pip install --editable .
 ```
@@ -308,6 +308,6 @@ Coming soon!
 }
 ``` -->
-## Acknowledgement
 R-4B is developed based on the codebases of the following projects: [LLaVA-Next](https://github.com/LLaVA-VL/LLaVA-NeXT), [SigLIP2](https://huggingface.co/google/siglip2-so400m-patch14-384), [Qwen3](https://github.com/QwenLM/Qwen3), [Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL), [VLMEvalKit](https://github.com/open-compass/VLMEvalKit). We sincerely thank these projects for their outstanding work.

   <img src="asset/R-4B.png" width="100%" alt="R-4B Performance">
 </div>
 ## ⭐️ Introduction
+In this repo, we present **R-4B**, a multimodal large language model designed for general-purpose auto-thinking, autonomously switching between step-by-step thinking and direct response generation based on task complexity. This capability enables R-4B to deliver high-quality responses while significantly improving inference efficiency and reducing computational costs.
 The development of R-4B follows a two-stage training paradigm:
+(1) Bi-mode Annealing, which establishes both thinking and non-thinking capabilities for VQA; and
+(2) Bi-mode Policy Optimization (BPO), which enables the model to adaptively switch between thinking and non-thinking modes based on input demands.
 R-4B achieves state-of-the-art performance among models of its scale. In evaluations across multiple public benchmarks, R-4B outperforms Qwen2.5-VL-7B on nearly all tasks. Notably, it matches or exceeds the performance of the much larger Kimi-VL-Thinking-2506 (3B activated, 16B total parameters).
 import requests
 import torch
 from transformers import AutoModel, AutoProcessor
+from PIL import Image
 model_path = "YannQi/R-4B"
 model = AutoModel.from_pretrained(
     model_path,
     torch_dtype=torch.float16,
 #### Install
+The code of R-4B requires the newest vllm now. Please install from local source:
 ```bash
+git clone https://github.com/vllm-project/vllm.git
 cd vllm
 VLLM_USE_PRECOMPILED=1 uv pip install --editable .
 ```
 }
 ``` -->
+## Acknowledgements
 R-4B is developed based on the codebases of the following projects: [LLaVA-Next](https://github.com/LLaVA-VL/LLaVA-NeXT), [SigLIP2](https://huggingface.co/google/siglip2-so400m-patch14-384), [Qwen3](https://github.com/QwenLM/Qwen3), [Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL), [VLMEvalKit](https://github.com/open-compass/VLMEvalKit). We sincerely thank these projects for their outstanding work.