stepfun-ai
/

Step3-VL-10B-Base

Image-Text-to-Text

Model card Files Files and versions

Kangheng commited on 3 days ago

Commit

186f206

·

verified ·

1 Parent(s): 05dbbd4

Update README.md

Files changed (1) hide show

README.md +0 -41

README.md CHANGED Viewed

@@ -166,47 +166,6 @@ decoded = processor.decode(generate_ids[0, inputs["input_ids"].shape[-1] :], ski
 print(decoded)
 ```
-### Requirements
-To run STEP3-VL-10B efficiently, we recommend setting up a Python environment (>=3.10) with **vLLM**:
-```bash
-pip install vllm>=0.6.3
-```
-### vLLM Inference Example
-Below is a minimal example to load the model and generate a response using vLLM's chat API.
-```python
-from vllm import LLM, SamplingParams
-# 1. Load the model
-# Ensure you have ~24GB VRAM for BF16 inference
-llm = LLM(
-    model="stepfun-ai/Step3-VL-10B",
-    trust_remote_code=True,
-    gpu_memory_utilization=0.95
-)
-# 2. Prepare input (Supports local paths or URLs)
-messages = [
-    {
-        "role": "user",
-        "content": [
-            {"type": "image", "image": "[https://modelscope.oss-cn-beijing.aliyuncs.com/resource/demo.jpg](https://modelscope.oss-cn-beijing.aliyuncs.com/resource/demo.jpg)"},
-            {"type": "text", "text": "Describe this image in detail."}
-        ]
-    }
-]
-# 3. Generate
-sampling_params = SamplingParams(temperature=0.1, max_tokens=1024)
-outputs = llm.chat(messages=messages, sampling_params=sampling_params)
-print(f"Output: {outputs[0].outputs[0].text}")
-```
 ## 📜 Citation
 If you find this project useful in your research, please cite our technical report:

 print(decoded)
 ```
 ## 📜 Citation
 If you find this project useful in your research, please cite our technical report: