lastmass commited on
Commit
01edaff
·
1 Parent(s): 61f49a8

Upgrade llama-cpp-python CPU wheel

Browse files
Files changed (2) hide show
  1. README.md +6 -5
  2. requirements.txt +1 -1
README.md CHANGED
@@ -52,11 +52,12 @@ launch and cached.
52
 
53
  If you deploy on **ZeroGPU**, keep the prebuilt CPU `llama-cpp-python` wheel.
54
  The `requirements.txt` file uses the CPU wheel index
55
- (`llama-cpp-python/whl/cpu`) plus `--only-binary=llama-cpp-python`, so the Space
56
- will fail fast instead of trying to compile llama.cpp from source. Do not use the
57
- CUDA wheel URL (`llama-cpp-python/whl/cu124`) unless the Space image also
58
- provides CUDA runtime libraries such as `libcudart.so.12`; otherwise model
59
- loading can fail when the first button click triggers inference.
 
60
 
61
  - Set `DEMO_MODE=auto` (default) to allow a graceful scripted fallback if the
62
  model cannot load.
 
52
 
53
  If you deploy on **ZeroGPU**, keep the prebuilt CPU `llama-cpp-python` wheel.
54
  The `requirements.txt` file uses the CPU wheel index
55
+ (`llama-cpp-python/whl/cpu`) plus `--only-binary=llama-cpp-python`, and pins to
56
+ the latest available prebuilt wheel in that index. This keeps the Space from
57
+ trying to compile llama.cpp from source. Do not use the CUDA wheel URL
58
+ (`llama-cpp-python/whl/cu124`) unless the Space image also provides CUDA runtime
59
+ libraries such as `libcudart.so.12`; otherwise model loading can fail when the
60
+ first button click triggers inference.
61
 
62
  - Set `DEMO_MODE=auto` (default) to allow a graceful scripted fallback if the
63
  model cannot load.
requirements.txt CHANGED
@@ -2,4 +2,4 @@
2
  --only-binary=llama-cpp-python
3
 
4
  gradio==6.15.2
5
- llama-cpp-python==0.3.22
 
2
  --only-binary=llama-cpp-python
3
 
4
  gradio==6.15.2
5
+ llama-cpp-python==0.3.25