Luigi commited on
Commit
4e1369a
·
verified ·
1 Parent(s): d0a14f5

Add comprehensive README with usage instructions

Browse files
Files changed (1) hide show
  1. README.md +161 -3
README.md CHANGED
@@ -1,3 +1,161 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # llama-cpp-python Prebuilt Wheels for HuggingFace Spaces (Free CPU)
2
+
3
+ Prebuilt `llama-cpp-python` wheels optimized for HuggingFace Spaces free tier (16GB RAM, 2 vCPU, CPU-only).
4
+
5
+ ## Purpose
6
+
7
+ These wheels include the latest llama.cpp backend with support for newer model architectures:
8
+ - **LFM2 MoE** architecture (32 experts) for LFM2-8B-A1B
9
+ - Latest IQ4_XS quantization support
10
+ - OpenBLAS CPU acceleration
11
+
12
+ ## Available Wheels
13
+
14
+ | Wheel File | Python | Platform | llama.cpp | Features |
15
+ |------------|--------|----------|-----------|----------|
16
+ | `llama_cpp_python-0.3.22-cp310-cp310-linux_x86_64.whl` | 3.10 | Linux x86_64 | Latest (Jan 2026) | LFM2 MoE, IQ4_XS, OpenBLAS |
17
+
18
+ ## Usage
19
+
20
+ ### Setting Up HuggingFace Spaces with Python 3.10
21
+
22
+ These wheels are built for **Python 3.10**. To use them in HuggingFace Spaces:
23
+
24
+ **Step 1: Switch to Docker**
25
+ 1. Go to your Space settings
26
+ 2. Change "Space SDK" from **Gradio** to **Docker**
27
+ 3. This enables custom Dockerfile support
28
+
29
+ **Step 2: Create a Dockerfile with Python 3.10**
30
+
31
+ Your Dockerfile should start with `python:3.10-slim` as the base image:
32
+
33
+ ```dockerfile
34
+ # Use Python 3.10 explicitly (required for these wheels)
35
+ FROM python:3.10-slim
36
+
37
+ WORKDIR /app
38
+
39
+ # Install system dependencies
40
+ RUN apt-get update && apt-get install -y \
41
+ gcc g++ make cmake git libopenblas-dev \
42
+ && rm -rf /var/lib/apt/lists/*
43
+
44
+ # Install llama-cpp-python from prebuilt wheel
45
+ RUN pip install --no-cache-dir \
46
+ https://huggingface.co/Luigi/llama-cpp-python-wheels-hf-spaces-free-cpu/resolve/main/llama_cpp_python-0.3.22-cp310-cp310-linux_x86_64.whl
47
+
48
+ # Install other dependencies
49
+ COPY requirements.txt .
50
+ RUN pip install --no-cache-dir -r requirements.txt
51
+
52
+ # Copy application code
53
+ COPY . .
54
+
55
+ # Set environment variables
56
+ ENV PYTHONUNBUFFERED=1
57
+ ENV GRADIO_SERVER_NAME=0.0.0.0
58
+
59
+ # Expose Gradio port
60
+ EXPOSE 7860
61
+
62
+ # Run the app
63
+ CMD ["python", "app.py"]
64
+ ```
65
+
66
+ **Complete Example:** See the template below for a production-ready setup.
67
+
68
+ ### Why Docker SDK?
69
+
70
+ When you use a custom Dockerfile:
71
+ - ✅ Explicit Python version control (`FROM python:3.10-slim`)
72
+ - ✅ Full control over system dependencies
73
+ - ✅ Can use prebuilt wheels for faster builds
74
+ - ✅ No need for `runtime.txt` (Dockerfile takes precedence)
75
+
76
+ ### Dockerfile (Recommended)
77
+
78
+ ```dockerfile
79
+ FROM python:3.10-slim
80
+
81
+ # Install system dependencies for OpenBLAS
82
+ RUN apt-get update && apt-get install -y \
83
+ gcc g++ make cmake git libopenblas-dev \
84
+ && rm -rf /var/lib/apt/lists/*
85
+
86
+ # Install llama-cpp-python from prebuilt wheel (fast)
87
+ RUN pip install --no-cache-dir \
88
+ https://huggingface.co/Luigi/llama-cpp-python-wheels-hf-spaces-free-cpu/resolve/main/llama_cpp_python-0.3.22-cp310-cp310-linux_x86_64.whl
89
+ ```
90
+
91
+ ### With Fallback to Source Build
92
+
93
+ ```dockerfile
94
+ # Try prebuilt wheel first, fall back to source build if unavailable
95
+ RUN if pip install --no-cache-dir https://huggingface.co/Luigi/llama-cpp-python-wheels-hf-spaces-free-cpu/resolve/main/llama_cpp_python-0.3.22-cp310-cp310-linux_x86_64.whl; then \
96
+ echo "✅ Using prebuilt wheel"; \
97
+ else \
98
+ echo "⚠️ Building from source"; \
99
+ pip install --no-cache-dir git+https://github.com/JamePeng/llama-cpp-python.git@5a0391e8; \
100
+ fi
101
+ ```
102
+
103
+ ## Why This Fork?
104
+
105
+ These wheels are built from the **JamePeng/llama-cpp-python** fork (v0.3.22) instead of the official abetlen/llama-cpp-python:
106
+
107
+ | Repository | Latest Version | llama.cpp | LFM2 MoE Support |
108
+ |------------|---------------|-----------|-----------------|
109
+ | JamePeng fork | v0.3.22 (Jan 2026) | Latest | ✅ Yes |
110
+ | Official (abetlen) | v0.3.16 (Aug 2025) | Outdated | ❌ No |
111
+
112
+ **Key Difference:** LFM2-8B-A1B requires llama.cpp backend with LFM2 MoE architecture support (added Oct 2025). The official llama-cpp-python hasn't been updated since August 2025.
113
+
114
+ ## Build Configuration
115
+
116
+ ```bash
117
+ CMAKE_ARGS="-DGGML_OPENBLAS=ON -DGGML_NATIVE=OFF"
118
+ FORCE_CMAKE=1
119
+ pip wheel --no-deps git+https://github.com/JamePeng/llama-cpp-python.git@5a0391e8
120
+ ```
121
+
122
+ ## Supported Models
123
+
124
+ These wheels enable the following IQ4_XS quantized models:
125
+
126
+ - **LFM2-8B-A1B** (LiquidAI) - 8.3B params, 1.5B active, MoE with 32 experts
127
+ - **Granite-4.0-h-micro** (IBM) - Ultra-fast inference
128
+ - **Granite-4.0-h-tiny** (IBM) - Balanced speed/quality
129
+ - All standard llama.cpp models (Llama, Gemma, Qwen, etc.)
130
+
131
+ ## Performance
132
+
133
+ - **Build time savings:** ~4 minutes → 3 seconds (98% faster)
134
+ - **Memory footprint:** Fits in 16GB RAM with context up to 8192 tokens
135
+ - **CPU acceleration:** OpenBLAS optimized for x86_64
136
+
137
+ ## Limitations
138
+
139
+ - **CPU-only:** No GPU/CUDA support (optimized for HF Spaces free tier)
140
+ - **Platform:** Linux x86_64 only
141
+ - **Python:** 3.10 only (matches HF Spaces default)
142
+
143
+ ## License
144
+
145
+ These wheels include code from:
146
+ - [llama-cpp-python](https://github.com/JamePeng/llama-cpp-python) (MIT license)
147
+ - [llama.cpp](https://github.com/ggerganov/llama.cpp) (MIT license)
148
+
149
+ See upstream repositories for full license information.
150
+
151
+ ## Maintenance
152
+
153
+ Built from: https://github.com/JamePeng/llama-cpp-python/tree/5a0391e8
154
+
155
+ To rebuild: See `build_wheel.sh` in the main project repository.
156
+
157
+ ## Related
158
+
159
+ - Main project: [gemma-book-summarizer](https://huggingface.co/spaces/Luigi/gemma-book-summarizer)
160
+ - JamePeng fork: https://github.com/JamePeng/llama-cpp-python
161
+ - Original project: https://github.com/abetlen/llama-cpp-python