Upgrade llama-cpp-python CPU wheel
Browse files- README.md +6 -5
- requirements.txt +1 -1
README.md
CHANGED
|
@@ -52,11 +52,12 @@ launch and cached.
|
|
| 52 |
|
| 53 |
If you deploy on **ZeroGPU**, keep the prebuilt CPU `llama-cpp-python` wheel.
|
| 54 |
The `requirements.txt` file uses the CPU wheel index
|
| 55 |
-
(`llama-cpp-python/whl/cpu`) plus `--only-binary=llama-cpp-python`,
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
|
|
|
| 60 |
|
| 61 |
- Set `DEMO_MODE=auto` (default) to allow a graceful scripted fallback if the
|
| 62 |
model cannot load.
|
|
|
|
| 52 |
|
| 53 |
If you deploy on **ZeroGPU**, keep the prebuilt CPU `llama-cpp-python` wheel.
|
| 54 |
The `requirements.txt` file uses the CPU wheel index
|
| 55 |
+
(`llama-cpp-python/whl/cpu`) plus `--only-binary=llama-cpp-python`, and pins to
|
| 56 |
+
the latest available prebuilt wheel in that index. This keeps the Space from
|
| 57 |
+
trying to compile llama.cpp from source. Do not use the CUDA wheel URL
|
| 58 |
+
(`llama-cpp-python/whl/cu124`) unless the Space image also provides CUDA runtime
|
| 59 |
+
libraries such as `libcudart.so.12`; otherwise model loading can fail when the
|
| 60 |
+
first button click triggers inference.
|
| 61 |
|
| 62 |
- Set `DEMO_MODE=auto` (default) to allow a graceful scripted fallback if the
|
| 63 |
model cannot load.
|
requirements.txt
CHANGED
|
@@ -2,4 +2,4 @@
|
|
| 2 |
--only-binary=llama-cpp-python
|
| 3 |
|
| 4 |
gradio==6.15.2
|
| 5 |
-
llama-cpp-python==0.3.
|
|
|
|
| 2 |
--only-binary=llama-cpp-python
|
| 3 |
|
| 4 |
gradio==6.15.2
|
| 5 |
+
llama-cpp-python==0.3.25
|