YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)
vLLM Sonic Extension
This repository contains the Sonic compiler integration for vLLM (CPU) using the Sonic MLIR compiler.
Prerequisites
- Python 3.8+
- torch-mlir
- LLVM
- Sonic MLIR
- Access to the Sonic frontend repository (
$AGICL_DIR/sonic-frontend)
Installation
1. Set up Python Virtual Environment
cd <your-workspace-directory>
python3 -m venv vllm
source ./vllm/bin/activate
2. Install vLLM (CPU Build)
cd <parent-directory-for-repositories>
git clone https://github.com/vllm-project/vllm.git
cd vllm
git checkout 134f70b3eddf05f01f55ecee9c2a14ec0732e8b6
pip install setuptools_scm
pip install -r requirements/cpu.txt
Set environment variables for CPU-only build:
export VLLM_TARGET_DEVICE=cpu
export CUDA_VISIBLE_DEVICES=""
export CMAKE_ARGS="-DVLLM_GPU_LANG=cpu -DWITH_CUDA=OFF -DUSE_CUDA=OFF -DCUDA_TOOLKIT_ROOT_DIR='' -DCUDAToolkit_ROOT=''"
export TORCH_CUDA_ARCH_LIST=""
export FORCE_CUDA=0
export USE_CUDA=0
export CUDACXX=""
Build and install vLLM:
VLLM_TARGET_DEVICE=cpu pip install . --no-build-isolation --verbose
3. Install vLLM Sonic Extension
cd ..
git clone https://github.com/artyom-beilis/vllm-sonic.git
cd vllm-sonic
VLLM_TARGET_DEVICE="empty" python -m pip install -v .
4. Set up Sonic Frontend
Build the dynamo executor extension:
cd $AGICL_DIR/sonic-frontend/dynamo_executor
python3 setup.py build_ext --inplace
Add Sonic frontend to Python path:
export PYTHONPATH="$PYTHONPATH:$AGICL_DIR/sonic-frontend"
Usage
Basic Chat Example
python3 examples/sonic_chat.py
Inference with Eager Mode Validation
VLLM_SONIC_EAGER_VALIDATION=1 python3 examples/sonic_basic_inference_comparsion.py
Examples
The repository includes several example scripts:
examples/sonic_chat.py- Interactive chat exampleexamples/sonic_basic_inference.py- Basic inference exampleexamples/sonic_basic_inference_comparsion.py- Comparison with eager modeexamples/sonic_eager_mode_example.py- Eager mode demonstration
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support