This is upstage/Solar-Open-100B compressed with llm-compressor 0.13.0 using a data-free recipe.

This model requires a fork of vLLM.

Create and activate a Python virtual environment

uv venv --python 3.12 --seed
source .venv/bin/activate

Install Solar Open's optimized vLLM

VLLM_PRECOMPILED_WHEEL_LOCATION="https://github.com/vllm-project/vllm/releases/download/v0.12.0/vllm-0.12.0-cp38-abi3-manylinux_2_31_x86_64.whl" \
VLLM_USE_PRECOMPILED=1 \
uv pip install git+https://github.com/UpstageAI/vllm.git@v0.12.0-solar-open

This model implements custom Logits Processors

Start the vLLM server (For 4x48 GPUs)

vllm serve upstage/Solar-Open-100B \
    --trust-remote-code \
    --enable-auto-tool-choice \
    --tool-call-parser solar_open \
    --reasoning-parser solar_open \
    --logits-processors vllm.model_executor.models.parallel_tool_call_logits_processor:ParallelToolCallLogitsProcessor \
    --logits-processors vllm.model_executor.models.solar_open_logits_processor:SolarOpenTemplateLogitsProcessor \
    --tensor-parallel-size 4

For 96GB GPUs you should be able to drop down to tp2.

Reasoning Effort

This is not documented in the upstream model card but the solar_open_logits_processor implements reasoning_effort with two values: medium and high

See solar_open_logits_processor.py

Example sampling configuration:

{
    "temperature": 0.8,
    "top_p": 0.95,
    "top_k": 50,
    "reasoning_effort": "medium"
}
Downloads last month
6
Safetensors
Model size
103B params
Tensor type
F32
BF16
F8_E4M3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for mike-ravkine/Solar-Open-100B-FP8-Dynamic

Quantized
(12)
this model