mike-ravkine's picture
Update README.md
b141d76 verified
metadata
language:
  - en
  - ko
license: other
license_name: solar-apache-2.0
tags:
  - upstage
  - solar
  - moe
  - 100b
  - llm
  - fp8
base_model:
  - upstage/Solar-Open-100B

This is upstage/Solar-Open-100B compressed with llm-compressor 0.13.0 using a data-free recipe.

This model requires a fork of vLLM.

Create and activate a Python virtual environment

uv venv --python 3.12 --seed
source .venv/bin/activate

Install Solar Open's optimized vLLM

VLLM_PRECOMPILED_WHEEL_LOCATION="https://github.com/vllm-project/vllm/releases/download/v0.12.0/vllm-0.12.0-cp38-abi3-manylinux_2_31_x86_64.whl" \
VLLM_USE_PRECOMPILED=1 \
uv pip install git+https://github.com/UpstageAI/vllm.git@v0.12.0-solar-open

This model implements custom Logits Processors

Start the vLLM server (For 4x48 GPUs)

vllm serve upstage/Solar-Open-100B \
    --trust-remote-code \
    --enable-auto-tool-choice \
    --tool-call-parser solar_open \
    --reasoning-parser solar_open \
    --logits-processors vllm.model_executor.models.parallel_tool_call_logits_processor:ParallelToolCallLogitsProcessor \
    --logits-processors vllm.model_executor.models.solar_open_logits_processor:SolarOpenTemplateLogitsProcessor \
    --tensor-parallel-size 4

For 96GB GPUs you should be able to drop down to tp2.

Reasoning Effort

This is not documented in the upstream model card but the solar_open_logits_processor implements reasoning_effort with two values: medium and high

See solar_open_logits_processor.py

Example sampling configuration:

{
    "temperature": 0.8,
    "top_p": 0.95,
    "top_k": 50,
    "reasoning_effort": "medium"
}