This is upstage/Solar-Open-100B compressed with llm-compressor 0.13.0 using a data-free recipe.
This model requires a fork of vLLM.
Create and activate a Python virtual environment
uv venv --python 3.12 --seed
source .venv/bin/activate
Install Solar Open's optimized vLLM
VLLM_PRECOMPILED_WHEEL_LOCATION="https://github.com/vllm-project/vllm/releases/download/v0.12.0/vllm-0.12.0-cp38-abi3-manylinux_2_31_x86_64.whl" \
VLLM_USE_PRECOMPILED=1 \
uv pip install git+https://github.com/UpstageAI/vllm.git@v0.12.0-solar-open
This model implements custom Logits Processors
Start the vLLM server (For 4x48 GPUs)
vllm serve upstage/Solar-Open-100B \
--trust-remote-code \
--enable-auto-tool-choice \
--tool-call-parser solar_open \
--reasoning-parser solar_open \
--logits-processors vllm.model_executor.models.parallel_tool_call_logits_processor:ParallelToolCallLogitsProcessor \
--logits-processors vllm.model_executor.models.solar_open_logits_processor:SolarOpenTemplateLogitsProcessor \
--tensor-parallel-size 4
For 96GB GPUs you should be able to drop down to tp2.
Reasoning Effort
This is not documented in the upstream model card but the solar_open_logits_processor implements reasoning_effort with two values: medium and high
See solar_open_logits_processor.py
Example sampling configuration:
{
"temperature": 0.8,
"top_p": 0.95,
"top_k": 50,
"reasoning_effort": "medium"
}
- Downloads last month
- 6
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
馃檵
Ask for provider support
Model tree for mike-ravkine/Solar-Open-100B-FP8-Dynamic
Base model
upstage/Solar-Open-100B