YannQi commited on
Commit
2eea0de
·
verified ·
1 Parent(s): 675fcee

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -8
README.md CHANGED
@@ -19,13 +19,14 @@ pipeline_tag: visual-question-answering
19
  <img src="asset/R-4B.png" width="100%" alt="R-4B Performance">
20
  </div>
21
 
 
22
  ## ⭐️ Introduction
23
 
24
- In this report, we present **R-4B**, a multimodal large language model designed to achieve adaptive multimodal reasoning—dynamically choosing between step-by-step thinking and direct response generation based on task complexity. This capability enables R-4B to deliver high-quality responses while significantly improving inference efficiency and reducing computational costs.
25
 
26
  The development of R-4B follows a two-stage training paradigm:
27
- (1) Dual-Capability Pretraining, which establishes both thinking and non-thinking capabilities for VQA; and
28
- (2) Adaptive Thinking Post-Training, which enables the model to adaptively switch between modes based on input demands.
29
 
30
  R-4B achieves state-of-the-art performance among models of its scale. In evaluations across multiple public benchmarks, R-4B outperforms Qwen2.5-VL-7B on nearly all tasks. Notably, it matches or exceeds the performance of the much larger Kimi-VL-Thinking-2506 (3B activated, 16B total parameters).
31
 
@@ -42,11 +43,10 @@ Below, we provide simple examples to show how to use R-4B with 🤗 Transformers
42
  import requests
43
  import torch
44
  from transformers import AutoModel, AutoProcessor
45
-
46
 
47
  model_path = "YannQi/R-4B"
48
 
49
- from PIL import Image
50
  model = AutoModel.from_pretrained(
51
  model_path,
52
  torch_dtype=torch.float16,
@@ -109,10 +109,10 @@ print("Auto Thinking Output:", output_text_auto_thinking)
109
 
110
  #### Install
111
 
112
- The code of R-4B requires custom vllm. Please install from local source:
113
 
114
  ```bash
115
- git clone https://github.com/yannqi/vllm.git
116
  cd vllm
117
  VLLM_USE_PRECOMPILED=1 uv pip install --editable .
118
  ```
@@ -308,6 +308,6 @@ Coming soon!
308
  }
309
  ``` -->
310
 
311
- ## Acknowledgement
312
 
313
  R-4B is developed based on the codebases of the following projects: [LLaVA-Next](https://github.com/LLaVA-VL/LLaVA-NeXT), [SigLIP2](https://huggingface.co/google/siglip2-so400m-patch14-384), [Qwen3](https://github.com/QwenLM/Qwen3), [Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL), [VLMEvalKit](https://github.com/open-compass/VLMEvalKit). We sincerely thank these projects for their outstanding work.
 
19
  <img src="asset/R-4B.png" width="100%" alt="R-4B Performance">
20
  </div>
21
 
22
+
23
  ## ⭐️ Introduction
24
 
25
+ In this repo, we present **R-4B**, a multimodal large language model designed for general-purpose auto-thinking, autonomously switching between step-by-step thinking and direct response generation based on task complexity. This capability enables R-4B to deliver high-quality responses while significantly improving inference efficiency and reducing computational costs.
26
 
27
  The development of R-4B follows a two-stage training paradigm:
28
+ (1) Bi-mode Annealing, which establishes both thinking and non-thinking capabilities for VQA; and
29
+ (2) Bi-mode Policy Optimization (BPO), which enables the model to adaptively switch between thinking and non-thinking modes based on input demands.
30
 
31
  R-4B achieves state-of-the-art performance among models of its scale. In evaluations across multiple public benchmarks, R-4B outperforms Qwen2.5-VL-7B on nearly all tasks. Notably, it matches or exceeds the performance of the much larger Kimi-VL-Thinking-2506 (3B activated, 16B total parameters).
32
 
 
43
  import requests
44
  import torch
45
  from transformers import AutoModel, AutoProcessor
46
+ from PIL import Image
47
 
48
  model_path = "YannQi/R-4B"
49
 
 
50
  model = AutoModel.from_pretrained(
51
  model_path,
52
  torch_dtype=torch.float16,
 
109
 
110
  #### Install
111
 
112
+ The code of R-4B requires the newest vllm now. Please install from local source:
113
 
114
  ```bash
115
+ git clone https://github.com/vllm-project/vllm.git
116
  cd vllm
117
  VLLM_USE_PRECOMPILED=1 uv pip install --editable .
118
  ```
 
308
  }
309
  ``` -->
310
 
311
+ ## Acknowledgements
312
 
313
  R-4B is developed based on the codebases of the following projects: [LLaVA-Next](https://github.com/LLaVA-VL/LLaVA-NeXT), [SigLIP2](https://huggingface.co/google/siglip2-so400m-patch14-384), [Qwen3](https://github.com/QwenLM/Qwen3), [Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL), [VLMEvalKit](https://github.com/open-compass/VLMEvalKit). We sincerely thank these projects for their outstanding work.