| | --- |
| | language: |
| | - en |
| | license: cc-by-nc-4.0 |
| | pipeline_tag: image-text-to-text |
| | tags: |
| | - multimodal |
| | base_model: |
| | - Qwen/Qwen2-VL-7B-Instruct |
| | --- |
| | # S1-M-7B-Beta |
| |
|
| | [๐ Homepage](https://github.com/PKU-Alignment/s1-m) | [๐ Our Official Code Repo](https://github.com/PKU-Alignment/s1-m) | [๐ค S1-M Dataset (Beta)](https://huggingface.co/datasets/PKU-Alignment/s1-m_beta) |
| |
|
| | S1-M-7B-Beta used for developing the algorithm "Simple Test-time Scaling in Multimodal Reasoning". By fine-tuning the base model `Qwen/Qwen2-VL-7B-Instruct` on data with thinking tags `<think>` and `</think>`, the model acquired the `think first, then response` paradigm, allowing for experiments on "Test-time Scaling". |
| |
|
| | **Note: The current model is a development version, not the final official version.** |