Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
dougeeai 
posted an update 23 days ago
Post
205
## Llama-cpp-python wheels for Windows - update

Pre-compiled wheels for llama-cpp-python on Windows. No Visual Studio, no CUDA Toolkit setup. pip install and run.

### New in this update

- **sm_120 (consumer/workstation Blackwell) support.** A single wheel now covers both sm_100 (datacenter) and sm_120 (RTX 5090 / 5080 / 5070 / 5060 / 5050, RTX PRO 6000 / 5000 / 4500 / 4000 / 2000 Blackwell).
- **llama-cpp-python 0.3.20** across all four architectures (Blackwell, Ada, Ampere, Turing). Brings Gemma 4 support via the updated llama.cpp core.
- **One wheel covers Python 3.10 through 3.13.** The 0.3.20 builds use py3-none tagging, no more per-interpreter builds.
- **Fixed three mislabeled 0.3.16 sm_86 wheels** that linked against the wrong CUDA cuBLAS. Properly-built replacement is available.

### Coverage

- **GPUs:** RTX 20 / 30 / 40 / 50 series, RTX PRO Blackwell workstation, B100 / B200 / B300 datacenter
- **CUDA:** 11.8 / 12.1 / 13.0
- **Python:** 3.10, 3.11, 3.12, 3.13

### Download

https://github.com/dougeeai/llama-cpp-python-wheels

Linux wheels still on the roadmap. File an issue if you need a specific configuration built.

Tags: #llama-cpp #gguf #windows #prebuilt #blackwell #rtx5090 #rtxpro6000 #rtxproblackwell #gemma4
In this post