Update README.md
Browse files
README.md
CHANGED
|
@@ -19,13 +19,14 @@ pipeline_tag: visual-question-answering
|
|
| 19 |
<img src="asset/R-4B.png" width="100%" alt="R-4B Performance">
|
| 20 |
</div>
|
| 21 |
|
|
|
|
| 22 |
## ⭐️ Introduction
|
| 23 |
|
| 24 |
-
In this
|
| 25 |
|
| 26 |
The development of R-4B follows a two-stage training paradigm:
|
| 27 |
-
(1)
|
| 28 |
-
(2)
|
| 29 |
|
| 30 |
R-4B achieves state-of-the-art performance among models of its scale. In evaluations across multiple public benchmarks, R-4B outperforms Qwen2.5-VL-7B on nearly all tasks. Notably, it matches or exceeds the performance of the much larger Kimi-VL-Thinking-2506 (3B activated, 16B total parameters).
|
| 31 |
|
|
@@ -42,11 +43,10 @@ Below, we provide simple examples to show how to use R-4B with 🤗 Transformers
|
|
| 42 |
import requests
|
| 43 |
import torch
|
| 44 |
from transformers import AutoModel, AutoProcessor
|
| 45 |
-
|
| 46 |
|
| 47 |
model_path = "YannQi/R-4B"
|
| 48 |
|
| 49 |
-
from PIL import Image
|
| 50 |
model = AutoModel.from_pretrained(
|
| 51 |
model_path,
|
| 52 |
torch_dtype=torch.float16,
|
|
@@ -109,10 +109,10 @@ print("Auto Thinking Output:", output_text_auto_thinking)
|
|
| 109 |
|
| 110 |
#### Install
|
| 111 |
|
| 112 |
-
The code of R-4B requires
|
| 113 |
|
| 114 |
```bash
|
| 115 |
-
git clone https://github.com/
|
| 116 |
cd vllm
|
| 117 |
VLLM_USE_PRECOMPILED=1 uv pip install --editable .
|
| 118 |
```
|
|
@@ -308,6 +308,6 @@ Coming soon!
|
|
| 308 |
}
|
| 309 |
``` -->
|
| 310 |
|
| 311 |
-
##
|
| 312 |
|
| 313 |
R-4B is developed based on the codebases of the following projects: [LLaVA-Next](https://github.com/LLaVA-VL/LLaVA-NeXT), [SigLIP2](https://huggingface.co/google/siglip2-so400m-patch14-384), [Qwen3](https://github.com/QwenLM/Qwen3), [Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL), [VLMEvalKit](https://github.com/open-compass/VLMEvalKit). We sincerely thank these projects for their outstanding work.
|
|
|
|
| 19 |
<img src="asset/R-4B.png" width="100%" alt="R-4B Performance">
|
| 20 |
</div>
|
| 21 |
|
| 22 |
+
|
| 23 |
## ⭐️ Introduction
|
| 24 |
|
| 25 |
+
In this repo, we present **R-4B**, a multimodal large language model designed for general-purpose auto-thinking, autonomously switching between step-by-step thinking and direct response generation based on task complexity. This capability enables R-4B to deliver high-quality responses while significantly improving inference efficiency and reducing computational costs.
|
| 26 |
|
| 27 |
The development of R-4B follows a two-stage training paradigm:
|
| 28 |
+
(1) Bi-mode Annealing, which establishes both thinking and non-thinking capabilities for VQA; and
|
| 29 |
+
(2) Bi-mode Policy Optimization (BPO), which enables the model to adaptively switch between thinking and non-thinking modes based on input demands.
|
| 30 |
|
| 31 |
R-4B achieves state-of-the-art performance among models of its scale. In evaluations across multiple public benchmarks, R-4B outperforms Qwen2.5-VL-7B on nearly all tasks. Notably, it matches or exceeds the performance of the much larger Kimi-VL-Thinking-2506 (3B activated, 16B total parameters).
|
| 32 |
|
|
|
|
| 43 |
import requests
|
| 44 |
import torch
|
| 45 |
from transformers import AutoModel, AutoProcessor
|
| 46 |
+
from PIL import Image
|
| 47 |
|
| 48 |
model_path = "YannQi/R-4B"
|
| 49 |
|
|
|
|
| 50 |
model = AutoModel.from_pretrained(
|
| 51 |
model_path,
|
| 52 |
torch_dtype=torch.float16,
|
|
|
|
| 109 |
|
| 110 |
#### Install
|
| 111 |
|
| 112 |
+
The code of R-4B requires the newest vllm now. Please install from local source:
|
| 113 |
|
| 114 |
```bash
|
| 115 |
+
git clone https://github.com/vllm-project/vllm.git
|
| 116 |
cd vllm
|
| 117 |
VLLM_USE_PRECOMPILED=1 uv pip install --editable .
|
| 118 |
```
|
|
|
|
| 308 |
}
|
| 309 |
``` -->
|
| 310 |
|
| 311 |
+
## Acknowledgements
|
| 312 |
|
| 313 |
R-4B is developed based on the codebases of the following projects: [LLaVA-Next](https://github.com/LLaVA-VL/LLaVA-NeXT), [SigLIP2](https://huggingface.co/google/siglip2-so400m-patch14-384), [Qwen3](https://github.com/QwenLM/Qwen3), [Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL), [VLMEvalKit](https://github.com/open-compass/VLMEvalKit). We sincerely thank these projects for their outstanding work.
|