nielsr HF Staff commited on
Commit
fc2ef66
·
verified ·
1 Parent(s): f7bf7c1

Add pipeline tag and library name to metadata

Browse files

This PR updates the model card metadata by adding the `pipeline_tag: image-text-to-text` and `library_name: transformers`.

Adding these tags will:
1. Improve the model's discoverability on the Hugging Face Hub under the [Image-Text-to-Text](https://huggingface.co/models?pipeline_tag=image-text-to-text) category.
2. Enable the "Use in Transformers" button on the model page, providing users with a ready-to-use code snippet.

The sample usage provided in the documentation confirms that the model is designed to work with the `transformers` library.

Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -1,7 +1,9 @@
1
  ---
2
- license: apache-2.0
3
  base_model:
4
- - stepfun-ai/Step3-VL-10B-Base
 
 
 
5
  ---
6
 
7
  <div align="center">
@@ -57,8 +59,6 @@ STEP3-VL-10B delivers best-in-class performance across major multimodal benchmar
57
  | **HMMT 2025** | 78.18 | **92.14** | 57.29 | 67.71 | 65.68 | 51.30 |
58
  | **LiveCodeBench** | 75.77 | **76.43** | 48.71 | 69.45 | 72.01 | 57.10 |
59
 
60
- <!-- > **Note:** **SeRe** (Sequential Reasoning) uses a max length of 64K tokens; **PaCoRe** (Parallel Coordinated Reasoning) synthesizes 16 SeRe rollouts with a max length of 128K tokens. -->
61
-
62
  > **Note on Inference Modes:**
63
  >
64
  > **SeRe (Sequential Reasoning):** The standard inference mode using sequential generation (Chain-of-Thought) with a max length of 64K tokens.
@@ -120,7 +120,7 @@ STEP3-VL-10B delivers best-in-class performance across major multimodal benchmar
120
 
121
  ### Inference with Hugging Face Transformers
122
 
123
- We introduce how to use our model at inference stage using transformers library. It is recommended to use python=3.10, torch>=2.1.0, and transformers=4.57.0 as the development environment.We currently only support bf16 inference, and multi-patch for image preprocessing is supported by default. This behavior is aligned with vllm and sglang.
124
 
125
  ```python
126
  from transformers import AutoProcessor, AutoModelForCausalLM
@@ -185,4 +185,4 @@ If you find this project useful in your research, please cite our technical repo
185
 
186
  ## 📄 License
187
 
188
- This project is open-sourced under the [Apache 2.0 License](https://www.google.com/search?q=LICENSE).
 
1
  ---
 
2
  base_model:
3
+ - stepfun-ai/Step3-VL-10B-Base
4
+ license: apache-2.0
5
+ library_name: transformers
6
+ pipeline_tag: image-text-to-text
7
  ---
8
 
9
  <div align="center">
 
59
  | **HMMT 2025** | 78.18 | **92.14** | 57.29 | 67.71 | 65.68 | 51.30 |
60
  | **LiveCodeBench** | 75.77 | **76.43** | 48.71 | 69.45 | 72.01 | 57.10 |
61
 
 
 
62
  > **Note on Inference Modes:**
63
  >
64
  > **SeRe (Sequential Reasoning):** The standard inference mode using sequential generation (Chain-of-Thought) with a max length of 64K tokens.
 
120
 
121
  ### Inference with Hugging Face Transformers
122
 
123
+ We introduce how to use our model at inference stage using transformers library. It is recommended to use python=3.10, torch>=2.1.0, and transformers=4.57.0 as the development environment. We currently only support bf16 inference, and multi-patch for image preprocessing is supported by default. This behavior is aligned with vllm and sglang.
124
 
125
  ```python
126
  from transformers import AutoProcessor, AutoModelForCausalLM
 
185
 
186
  ## 📄 License
187
 
188
+ This project is open-sourced under the [Apache 2.0 License](https://www.google.com/search?q=LICENSE).