Skywork-R1V3
Collection
Advanced multimodal reasoning model β’ 7 items β’ Updated β’ 14
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Skywork/Skywork-R1V3-38B-GGUF:# Run inference directly in the terminal:
llama-cli -hf Skywork/Skywork-R1V3-38B-GGUF:# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Skywork/Skywork-R1V3-38B-GGUF:# Run inference directly in the terminal:
./llama-cli -hf Skywork/Skywork-R1V3-38B-GGUF:git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Skywork/Skywork-R1V3-38B-GGUF:# Run inference directly in the terminal:
./build/bin/llama-cli -hf Skywork/Skywork-R1V3-38B-GGUF:docker model run hf.co/Skywork/Skywork-R1V3-38B-GGUF:This repository provides a GGUF quantized version of the Skywork-R1V3-38B model, converted using the latest master branch of llama.cpp. This version is optimized for fast and memory-efficient local inference on CPU or GPU.
You can run this model with llama.cpp:
./llama-server -m /path/to/Skywork-R1V3-38B-Q8_0.gguf --mmproj /path/to/mmproj-Skywork-R1V3-38B-f16.gguf --port 8080
You can now use OpenAI-compatible tools (like curl) to query the model:
BASE64_IMAGE=$(base64 -w 0 /path/to/image)
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Skywork-R1V3",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "Please describe this image."},
{"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,'"${BASE64_IMAGE}"'" }}
]
}
],
"temperature": 0.7,
"max_tokens": 512
}'
If you use this model in your research, please cite:
@misc{shen2025skyworkr1v3technicalreport,
title={Skywork-R1V3 Technical Report},
author={Wei Shen and Jiangbo Pei and Yi Peng and Xuchen Song and Yang Liu and Jian Peng and Haofeng Sun and Yunzhuo Hao and Peiyu Wang and Jianhao Zhang and Yahui Zhou},
year={2025},
eprint={2507.06167},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2507.06167},
}
4-bit
8-bit
Install from brew
# Start a local OpenAI-compatible server with a web UI: llama-server -hf Skywork/Skywork-R1V3-38B-GGUF:# Run inference directly in the terminal: llama-cli -hf Skywork/Skywork-R1V3-38B-GGUF: