Instructions to use Skywork/Skywork-R1V2-38B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Skywork/Skywork-R1V2-38B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Skywork/Skywork-R1V2-38B", trust_remote_code=True) messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Skywork/Skywork-R1V2-38B", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Skywork/Skywork-R1V2-38B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Skywork/Skywork-R1V2-38B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Skywork/Skywork-R1V2-38B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/Skywork/Skywork-R1V2-38B
- SGLang
How to use Skywork/Skywork-R1V2-38B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Skywork/Skywork-R1V2-38B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Skywork/Skywork-R1V2-38B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Skywork/Skywork-R1V2-38B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Skywork/Skywork-R1V2-38B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use Skywork/Skywork-R1V2-38B with Docker Model Runner:
docker model run hf.co/Skywork/Skywork-R1V2-38B
no think tokens generated by model (only chat template)
How can I parse the reasoning when there is no token generated by the model?
The model's generated output (the content field value) starts directly with the reasoning ("Okay, so the problem is asking...") without outputting a "think" tag itself. The "think" tag is only part of the prompt constructed by the template (conversation.py / tokenizer_config.json), not part of the text the model generates.
Critically, the model does not generate a "/think" tag which means that there is no delimiter token that can be used to identify when the thinking finishes and transitions to the actual response.
The result is the thinking text and final response are merged together and not seperated. Do you have a suggestion on how to solve this? (or maybe you are working on R1V3 using Qwen3-32B? :-] )
Hi, thanks for raising this! Actually, the model does generate the "/think" tag in its prediction to indicate the end of the reasoning process. The "think" tag is indeed part of the chat template defined in tokenizer_config.json or conversation.py, so it’s prepended during prompt construction. However, the "/think" is expected to be generated by the model itself, serving as a clear delimiter between the internal reasoning and the final answer.