MSGEncrypted commited on
Commit
b16996a
·
1 Parent(s): 300911c

model config wip

Browse files
Files changed (4) hide show
  1. README.md +21 -18
  2. USAGE.md +47 -36
  3. models.yaml +8 -2
  4. uv.lock +19 -17
README.md CHANGED
@@ -1,5 +1,6 @@
1
  ---
2
- title: Small Model Hackathon
 
3
  emoji: 🦙
4
  colorFrom: blue
5
  colorTo: green
@@ -7,7 +8,6 @@ sdk: docker
7
  app_port: 7860
8
  pinned: false
9
  license: apache-2.0
10
- ---
11
 
12
  # Small Model Hackathon
13
 
@@ -33,21 +33,23 @@ uv run python scripts/download_model.py
33
  uv run --package gradio-space python -m gradio_space.app
34
  ```
35
 
36
- Open http://localhost:7860. The model downloads from Hugging Face Hub on the first chat message (or set `MODEL_PATH` to a local GGUF).
37
 
38
  ## Environment variables
39
 
40
- | Variable | Default | Description |
41
- |----------|---------|-------------|
42
- | `INFERENCE_BACKEND` | `llama_cpp` | `llama_cpp` or `transformers` |
43
- | `MODEL_REPO` | `Qwen/Qwen2.5-3B-Instruct-GGUF` | Hub repo for GGUF |
44
- | `MODEL_FILE` | `qwen2.5-3b-instruct-q4_k_m.gguf` | GGUF filename |
45
- | `MODEL_PATH` | — | Local GGUF path (skips Hub download) |
46
- | `N_CTX` | `4096` | Context window |
47
- | `N_GPU_LAYERS` | `0` | GPU layers for llama.cpp (0 = CPU) |
48
- | `MODEL_ID` | `Qwen/Qwen2.5-3B-Instruct` | Used when `INFERENCE_BACKEND=transformers` |
49
 
50
- See [`.env.example`](.env.example) for a full template.
 
 
 
 
 
 
 
 
 
 
 
51
 
52
  ## Monorepo layout
53
 
@@ -81,11 +83,11 @@ docker run --rm -p 7860:7860 -e MODEL_REPO=Qwen/Qwen2.5-3B-Instruct-GGUF hackath
81
 
82
  ## Hackathon checklist
83
 
84
- - [ ] Choose a track (Backyard AI or Thousand Token Wood)
85
- - [ ] Space live under build-small-hackathon
86
- - [ ] Demo video recorded
87
- - [ ] Social post published
88
- - [ ] Submission locked in by **June 15, 2026**
89
 
90
  ### Badge targets
91
 
@@ -101,3 +103,4 @@ uv sync --package inference --extra transformers
101
  INFERENCE_BACKEND=transformers MODEL_ID=Qwen/Qwen2.5-3B-Instruct \
102
  uv run --package gradio-space python -m gradio_space.app
103
  ```
 
 
1
  ---
2
+
3
+ ## title: Small Model Hackathon
4
  emoji: 🦙
5
  colorFrom: blue
6
  colorTo: green
 
8
  app_port: 7860
9
  pinned: false
10
  license: apache-2.0
 
11
 
12
  # Small Model Hackathon
13
 
 
33
  uv run --package gradio-space python -m gradio_space.app
34
  ```
35
 
36
+ Open [http://localhost:7860](http://localhost:7860). The model downloads from Hugging Face Hub on the first chat message (or set `MODEL_PATH` to a local GGUF).
37
 
38
  ## Environment variables
39
 
 
 
 
 
 
 
 
 
 
40
 
41
+ | Variable | Default | Description |
42
+ | ------------------- | --------------------------------- | ------------------------------------------ |
43
+ | `INFERENCE_BACKEND` | `llama_cpp` | `llama_cpp` or `transformers` |
44
+ | `MODEL_REPO` | `Qwen/Qwen2.5-3B-Instruct-GGUF` | Hub repo for GGUF |
45
+ | `MODEL_FILE` | `qwen2.5-3b-instruct-q4_k_m.gguf` | GGUF filename |
46
+ | `MODEL_PATH` | — | Local GGUF path (skips Hub download) |
47
+ | `N_CTX` | `4096` | Context window |
48
+ | `N_GPU_LAYERS` | `0` | GPU layers for llama.cpp (0 = CPU) |
49
+ | `MODEL_ID` | `Qwen/Qwen2.5-3B-Instruct` | Used when `INFERENCE_BACKEND=transformers` |
50
+
51
+
52
+ See `[.env.example](.env.example)` for a full template.
53
 
54
  ## Monorepo layout
55
 
 
83
 
84
  ## Hackathon checklist
85
 
86
+ - Choose a track (Backyard AI or Thousand Token Wood)
87
+ - Space live under build-small-hackathon
88
+ - Demo video recorded
89
+ - Social post published
90
+ - Submission locked in by **June 15, 2026**
91
 
92
  ### Badge targets
93
 
 
103
  INFERENCE_BACKEND=transformers MODEL_ID=Qwen/Qwen2.5-3B-Instruct \
104
  uv run --package gradio-space python -m gradio_space.app
105
  ```
106
+
USAGE.md CHANGED
@@ -45,7 +45,7 @@ MODEL_PATH=./models/qwen2.5-3b-instruct-q4_k_m.gguf
45
  uv run --package gradio-space python -m gradio_space.app
46
  ```
47
 
48
- Open http://localhost:7860.
49
 
50
  The model loads on the **first chat message** unless you set `MODEL_PATH`. After code changes, restart the process to pick up updates.
51
 
@@ -61,16 +61,18 @@ uv run --package gradio-space python -c "from gradio_space.app import build_demo
61
 
62
  ### Local env reference
63
 
64
- | Variable | Default | Description |
65
- |----------|---------|-------------|
66
- | `INFERENCE_BACKEND` | `llama_cpp` | `llama_cpp` or `transformers` |
67
- | `MODEL_REPO` | `Qwen/Qwen2.5-3B-Instruct-GGUF` | Hub repo for GGUF |
68
- | `MODEL_FILE` | `qwen2.5-3b-instruct-q4_k_m.gguf` | GGUF filename |
69
- | `MODEL_PATH` | | Local GGUF path (skips Hub download) |
70
- | `N_CTX` | `4096` | Context window |
71
- | `N_GPU_LAYERS` | `0` | GPU layers for llama.cpp (`0` = CPU only) |
72
- | `PORT` | `7860` | Gradio listen port |
73
- | `MODEL_ID` | `Qwen/Qwen2.5-3B-Instruct` | Used when `INFERENCE_BACKEND=transformers` |
 
 
74
 
75
  ### Optional: transformers backend
76
 
@@ -98,7 +100,7 @@ docker run --rm -p 7860:7860 \
98
  hackathon-space
99
  ```
100
 
101
- Open http://localhost:7860. Stop with `Ctrl+C`.
102
 
103
  To use a pre-downloaded local model inside Docker, mount it and set `MODEL_PATH`:
104
 
@@ -142,22 +144,26 @@ hf repo create build-small-hackathon/<your-space-name> \
142
 
143
  ### 3. Configure hardware
144
 
145
- | Setting | Recommendation |
146
- |---------|----------------|
147
- | Hardware | **CPU basic** to start (llama.cpp with `N_GPU_LAYERS=0`) |
148
- | Upgrade | GPU Space if you set `N_GPU_LAYERS > 0` for faster inference |
 
 
149
 
150
  ### 4. Set Space environment variables
151
 
152
  In the Space **Settings → Variables and secrets**:
153
 
154
- | Variable | Value |
155
- |----------|-------|
156
- | `INFERENCE_BACKEND` | `llama_cpp` |
157
- | `MODEL_REPO` | `Qwen/Qwen2.5-3B-Instruct-GGUF` |
158
- | `MODEL_FILE` | `qwen2.5-3b-instruct-q4_k_m.gguf` |
159
- | `N_CTX` | `4096` |
160
- | `N_GPU_LAYERS` | `0` (or higher on GPU hardware) |
 
 
161
 
162
  ### 5. Build and verify
163
 
@@ -177,14 +183,16 @@ If cold starts are too slow, attach a **Storage Bucket** in Space settings so do
177
 
178
  ## Troubleshooting
179
 
180
- | Symptom | Likely cause | Fix |
181
- |---------|--------------|-----|
182
- | First chat hangs / slow | GGUF downloading from Hub | Pre-download locally; on Space, wait or use Storage Bucket |
183
- | `Failed to load model` in chat | Wrong `MODEL_REPO` / `MODEL_FILE` | Check env vars match a valid GGUF on Hub |
184
- | Docker build fails on `llama-cpp-python` | Missing build tools | Dockerfile already installs `build-essential` and `cmake` |
185
- | Space build fails | Missing `uv.lock` or README YAML | Ensure `sdk: docker` is in root `README.md` frontmatter |
186
- | `transformers` backend error | Optional deps not installed | Run `uv sync --package inference --extra transformers` |
187
- | Port already in use locally | Another process on 7860 | `PORT=7861 uv run --package gradio-space python -m gradio_space.app` |
 
 
188
 
189
  ---
190
 
@@ -196,8 +204,11 @@ All three environments use the same command:
196
  uv run --package gradio-space python -m gradio_space.app
197
  ```
198
 
199
- | Environment | How to run |
200
- |-------------|------------|
201
- | Local dev | `uv run --package gradio-space python -m gradio_space.app` |
202
- | Docker | `docker run -p 7860:7860 hackathon-space` |
203
- | HF Space | Built and started automatically from `Dockerfile` `CMD` |
 
 
 
 
45
  uv run --package gradio-space python -m gradio_space.app
46
  ```
47
 
48
+ Open [http://localhost:7860](http://localhost:7860).
49
 
50
  The model loads on the **first chat message** unless you set `MODEL_PATH`. After code changes, restart the process to pick up updates.
51
 
 
61
 
62
  ### Local env reference
63
 
64
+
65
+ | Variable | Default | Description |
66
+ | ------------------- | --------------------------------- | ------------------------------------------ |
67
+ | `INFERENCE_BACKEND` | `llama_cpp` | `llama_cpp` or `transformers` |
68
+ | `MODEL_REPO` | `Qwen/Qwen2.5-3B-Instruct-GGUF` | Hub repo for GGUF |
69
+ | `MODEL_FILE` | `qwen2.5-3b-instruct-q4_k_m.gguf` | GGUF filename |
70
+ | `MODEL_PATH` | | Local GGUF path (skips Hub download) |
71
+ | `N_CTX` | `4096` | Context window |
72
+ | `N_GPU_LAYERS` | `0` | GPU layers for llama.cpp (`0` = CPU only) |
73
+ | `PORT` | `7860` | Gradio listen port |
74
+ | `MODEL_ID` | `Qwen/Qwen2.5-3B-Instruct` | Used when `INFERENCE_BACKEND=transformers` |
75
+
76
 
77
  ### Optional: transformers backend
78
 
 
100
  hackathon-space
101
  ```
102
 
103
+ Open [http://localhost:7860](http://localhost:7860). Stop with `Ctrl+C`.
104
 
105
  To use a pre-downloaded local model inside Docker, mount it and set `MODEL_PATH`:
106
 
 
144
 
145
  ### 3. Configure hardware
146
 
147
+
148
+ | Setting | Recommendation |
149
+ | -------- | ------------------------------------------------------------ |
150
+ | Hardware | **CPU basic** to start (llama.cpp with `N_GPU_LAYERS=0`) |
151
+ | Upgrade | GPU Space if you set `N_GPU_LAYERS > 0` for faster inference |
152
+
153
 
154
  ### 4. Set Space environment variables
155
 
156
  In the Space **Settings → Variables and secrets**:
157
 
158
+
159
+ | Variable | Value |
160
+ | ------------------- | --------------------------------- |
161
+ | `INFERENCE_BACKEND` | `llama_cpp` |
162
+ | `MODEL_REPO` | `Qwen/Qwen2.5-3B-Instruct-GGUF` |
163
+ | `MODEL_FILE` | `qwen2.5-3b-instruct-q4_k_m.gguf` |
164
+ | `N_CTX` | `4096` |
165
+ | `N_GPU_LAYERS` | `0` (or higher on GPU hardware) |
166
+
167
 
168
  ### 5. Build and verify
169
 
 
183
 
184
  ## Troubleshooting
185
 
186
+
187
+ | Symptom | Likely cause | Fix |
188
+ | ---------------------------------------- | --------------------------------- | -------------------------------------------------------------------- |
189
+ | First chat hangs / slow | GGUF downloading from Hub | Pre-download locally; on Space, wait or use Storage Bucket |
190
+ | `Failed to load model` in chat | Wrong `MODEL_REPO` / `MODEL_FILE` | Check env vars match a valid GGUF on Hub |
191
+ | Docker build fails on `llama-cpp-python` | Missing build tools | Dockerfile already installs `build-essential` and `cmake` |
192
+ | Space build fails | Missing `uv.lock` or README YAML | Ensure `sdk: docker` is in root `README.md` frontmatter |
193
+ | `transformers` backend error | Optional deps not installed | Run `uv sync --package inference --extra transformers` |
194
+ | Port already in use locally | Another process on 7860 | `PORT=7861 uv run --package gradio-space python -m gradio_space.app` |
195
+
196
 
197
  ---
198
 
 
204
  uv run --package gradio-space python -m gradio_space.app
205
  ```
206
 
207
+
208
+ | Environment | How to run |
209
+ | ----------- | ---------------------------------------------------------- |
210
+ | Local dev | `uv run --package gradio-space python -m gradio_space.app` |
211
+ | Docker | `docker run -p 7860:7860 hackathon-space` |
212
+ | HF Space | Built and started automatically from `Dockerfile` `CMD` |
213
+
214
+
models.yaml CHANGED
@@ -2,14 +2,20 @@
2
  # Select active preset with ACTIVE_MODEL; override any field via .env (see .env.example).
3
 
4
  defaults:
5
- active_model: qwen3b-gguf
6
  # Dev: set ALLOW_MODEL_SWITCH=true in .env to expose a dropdown in Gradio.
7
  # Space: keep false so visitors use one pinned model.
8
  allow_model_switch: false
9
 
10
  models:
 
 
 
 
 
 
11
  qwen3b-gguf:
12
- label: Qwen 2.5 3B Instruct (GGUF, default)
13
  backend: llama_cpp
14
  model_repo: Qwen/Qwen2.5-3B-Instruct-GGUF
15
  model_file: qwen2.5-3b-instruct-q4_k_m.gguf
 
2
  # Select active preset with ACTIVE_MODEL; override any field via .env (see .env.example).
3
 
4
  defaults:
5
+ active_model: minicpm-v-4.6
6
  # Dev: set ALLOW_MODEL_SWITCH=true in .env to expose a dropdown in Gradio.
7
  # Space: keep false so visitors use one pinned model.
8
  allow_model_switch: false
9
 
10
  models:
11
+ minicpm-v-4.6:
12
+ label: MiniCPM-V 4.6 (Transformers, ~0.8B, default)
13
+ backend: transformers
14
+ model_id: openbmb/MiniCPM-V-4.6
15
+ trust_remote_code: true
16
+
17
  qwen3b-gguf:
18
+ label: Qwen 2.5 3B Instruct (GGUF)
19
  backend: llama_cpp
20
  model_repo: Qwen/Qwen2.5-3B-Instruct-GGUF
21
  model_file: qwen2.5-3b-instruct-q4_k_m.gguf
uv.lock CHANGED
@@ -198,7 +198,7 @@ name = "cuda-bindings"
198
  version = "13.3.1"
199
  source = { registry = "https://pypi.org/simple" }
200
  dependencies = [
201
- { name = "cuda-pathfinder" },
202
  ]
203
  wheels = [
204
  { url = "https://files.pythonhosted.org/packages/ce/67/5e7dba1ba576dd73da5dee894ca076ca5e959450dfff66d6d510a255d1f7/cuda_bindings-13.3.1-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:c7855c4868aabc0cfae28abbe83d56734bdfbd08f08fc234ac1912a12858bf49", size = 6025351, upload-time = "2026-05-29T23:11:49.685Z" },
@@ -229,34 +229,34 @@ wheels = [
229
 
230
  [package.optional-dependencies]
231
  cudart = [
232
- { name = "nvidia-cuda-runtime", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
233
  ]
234
  cufft = [
235
- { name = "nvidia-cufft", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
236
  ]
237
  cufile = [
238
  { name = "nvidia-cufile", marker = "sys_platform == 'linux'" },
239
  ]
240
  cupti = [
241
- { name = "nvidia-cuda-cupti", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
242
  ]
243
  curand = [
244
- { name = "nvidia-curand", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
245
  ]
246
  cusolver = [
247
- { name = "nvidia-cusolver", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
248
  ]
249
  cusparse = [
250
- { name = "nvidia-cusparse", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
251
  ]
252
  nvjitlink = [
253
- { name = "nvidia-nvjitlink", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
254
  ]
255
  nvrtc = [
256
- { name = "nvidia-cuda-nvrtc", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
257
  ]
258
  nvtx = [
259
- { name = "nvidia-nvtx", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
260
  ]
261
 
262
  [[package]]
@@ -500,6 +500,7 @@ source = { editable = "libs/inference" }
500
  dependencies = [
501
  { name = "huggingface-hub" },
502
  { name = "llama-cpp-python" },
 
503
  ]
504
 
505
  [package.optional-dependencies]
@@ -514,6 +515,7 @@ requires-dist = [
514
  { name = "accelerate", marker = "extra == 'transformers'", specifier = ">=1.2.0" },
515
  { name = "huggingface-hub", specifier = ">=0.27.0" },
516
  { name = "llama-cpp-python", specifier = ">=0.3.0" },
 
517
  { name = "torch", marker = "extra == 'transformers'", specifier = ">=2.5.0" },
518
  { name = "transformers", marker = "extra == 'transformers'", specifier = ">=4.47.0" },
519
  ]
@@ -720,7 +722,7 @@ name = "nvidia-cublas"
720
  version = "13.1.1.3"
721
  source = { registry = "https://pypi.org/simple" }
722
  dependencies = [
723
- { name = "nvidia-cuda-nvrtc" },
724
  ]
725
  wheels = [
726
  { url = "https://files.pythonhosted.org/packages/a7/a1/0bd24ee8c8d03adac032fd2909426a00c88f8c57961b1277ded97f91119f/nvidia_cublas-13.1.1.3-py3-none-manylinux_2_27_aarch64.whl", hash = "sha256:b7a210458267ac818974c53038fbec2e969d5c99f305ab15c72522fa9f001dd5", size = 542848918, upload-time = "2026-04-08T18:46:22.985Z" },
@@ -759,7 +761,7 @@ name = "nvidia-cudnn-cu13"
759
  version = "9.20.0.48"
760
  source = { registry = "https://pypi.org/simple" }
761
  dependencies = [
762
- { name = "nvidia-cublas" },
763
  ]
764
  wheels = [
765
  { url = "https://files.pythonhosted.org/packages/56/c5/83384d846b2fd17c44bd499b36c75a45ed4f095fbbb2252294e89cea5c5c/nvidia_cudnn_cu13-9.20.0.48-py3-none-manylinux_2_27_aarch64.whl", hash = "sha256:e31454ae00094b0c55319d9d15b6fa2fc50a9e1c0f5c8c80fb75258234e731e1", size = 444574296, upload-time = "2026-03-09T19:28:27.751Z" },
@@ -771,7 +773,7 @@ name = "nvidia-cufft"
771
  version = "12.0.0.61"
772
  source = { registry = "https://pypi.org/simple" }
773
  dependencies = [
774
- { name = "nvidia-nvjitlink" },
775
  ]
776
  wheels = [
777
  { url = "https://files.pythonhosted.org/packages/8b/ae/f417a75c0259e85c1d2f83ca4e960289a5f814ed0cea74d18c353d3e989d/nvidia_cufft-12.0.0.61-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:2708c852ef8cd89d1d2068bdbece0aa188813a0c934db3779b9b1faa8442e5f5", size = 214053554, upload-time = "2025-09-04T08:31:38.196Z" },
@@ -801,9 +803,9 @@ name = "nvidia-cusolver"
801
  version = "12.0.4.66"
802
  source = { registry = "https://pypi.org/simple" }
803
  dependencies = [
804
- { name = "nvidia-cublas" },
805
- { name = "nvidia-cusparse" },
806
- { name = "nvidia-nvjitlink" },
807
  ]
808
  wheels = [
809
  { url = "https://files.pythonhosted.org/packages/c8/c3/b30c9e935fc01e3da443ec0116ed1b2a009bb867f5324d3f2d7e533e776b/nvidia_cusolver-12.0.4.66-py3-none-manylinux_2_27_aarch64.whl", hash = "sha256:02c2457eaa9e39de20f880f4bd8820e6a1cfb9f9a34f820eb12a155aa5bc92d2", size = 223467760, upload-time = "2025-09-04T08:33:04.222Z" },
@@ -815,7 +817,7 @@ name = "nvidia-cusparse"
815
  version = "12.6.3.3"
816
  source = { registry = "https://pypi.org/simple" }
817
  dependencies = [
818
- { name = "nvidia-nvjitlink" },
819
  ]
820
  wheels = [
821
  { url = "https://files.pythonhosted.org/packages/f8/94/5c26f33738ae35276672f12615a64bd008ed5be6d1ebcb23579285d960a9/nvidia_cusparse-12.6.3.3-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:80bcc4662f23f1054ee334a15c72b8940402975e0eab63178fc7e670aa59472c", size = 162155568, upload-time = "2025-09-04T08:33:42.864Z" },
 
198
  version = "13.3.1"
199
  source = { registry = "https://pypi.org/simple" }
200
  dependencies = [
201
+ { name = "cuda-pathfinder", marker = "sys_platform != 'emscripten' and sys_platform != 'win32'" },
202
  ]
203
  wheels = [
204
  { url = "https://files.pythonhosted.org/packages/ce/67/5e7dba1ba576dd73da5dee894ca076ca5e959450dfff66d6d510a255d1f7/cuda_bindings-13.3.1-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:c7855c4868aabc0cfae28abbe83d56734bdfbd08f08fc234ac1912a12858bf49", size = 6025351, upload-time = "2026-05-29T23:11:49.685Z" },
 
229
 
230
  [package.optional-dependencies]
231
  cudart = [
232
+ { name = "nvidia-cuda-runtime", marker = "sys_platform == 'linux'" },
233
  ]
234
  cufft = [
235
+ { name = "nvidia-cufft", marker = "sys_platform == 'linux'" },
236
  ]
237
  cufile = [
238
  { name = "nvidia-cufile", marker = "sys_platform == 'linux'" },
239
  ]
240
  cupti = [
241
+ { name = "nvidia-cuda-cupti", marker = "sys_platform == 'linux'" },
242
  ]
243
  curand = [
244
+ { name = "nvidia-curand", marker = "sys_platform == 'linux'" },
245
  ]
246
  cusolver = [
247
+ { name = "nvidia-cusolver", marker = "sys_platform == 'linux'" },
248
  ]
249
  cusparse = [
250
+ { name = "nvidia-cusparse", marker = "sys_platform == 'linux'" },
251
  ]
252
  nvjitlink = [
253
+ { name = "nvidia-nvjitlink", marker = "sys_platform == 'linux'" },
254
  ]
255
  nvrtc = [
256
+ { name = "nvidia-cuda-nvrtc", marker = "sys_platform == 'linux'" },
257
  ]
258
  nvtx = [
259
+ { name = "nvidia-nvtx", marker = "sys_platform == 'linux'" },
260
  ]
261
 
262
  [[package]]
 
500
  dependencies = [
501
  { name = "huggingface-hub" },
502
  { name = "llama-cpp-python" },
503
+ { name = "pyyaml" },
504
  ]
505
 
506
  [package.optional-dependencies]
 
515
  { name = "accelerate", marker = "extra == 'transformers'", specifier = ">=1.2.0" },
516
  { name = "huggingface-hub", specifier = ">=0.27.0" },
517
  { name = "llama-cpp-python", specifier = ">=0.3.0" },
518
+ { name = "pyyaml", specifier = ">=6.0.2" },
519
  { name = "torch", marker = "extra == 'transformers'", specifier = ">=2.5.0" },
520
  { name = "transformers", marker = "extra == 'transformers'", specifier = ">=4.47.0" },
521
  ]
 
722
  version = "13.1.1.3"
723
  source = { registry = "https://pypi.org/simple" }
724
  dependencies = [
725
+ { name = "nvidia-cuda-nvrtc", marker = "sys_platform != 'emscripten' and sys_platform != 'win32'" },
726
  ]
727
  wheels = [
728
  { url = "https://files.pythonhosted.org/packages/a7/a1/0bd24ee8c8d03adac032fd2909426a00c88f8c57961b1277ded97f91119f/nvidia_cublas-13.1.1.3-py3-none-manylinux_2_27_aarch64.whl", hash = "sha256:b7a210458267ac818974c53038fbec2e969d5c99f305ab15c72522fa9f001dd5", size = 542848918, upload-time = "2026-04-08T18:46:22.985Z" },
 
761
  version = "9.20.0.48"
762
  source = { registry = "https://pypi.org/simple" }
763
  dependencies = [
764
+ { name = "nvidia-cublas", marker = "sys_platform != 'emscripten' and sys_platform != 'win32'" },
765
  ]
766
  wheels = [
767
  { url = "https://files.pythonhosted.org/packages/56/c5/83384d846b2fd17c44bd499b36c75a45ed4f095fbbb2252294e89cea5c5c/nvidia_cudnn_cu13-9.20.0.48-py3-none-manylinux_2_27_aarch64.whl", hash = "sha256:e31454ae00094b0c55319d9d15b6fa2fc50a9e1c0f5c8c80fb75258234e731e1", size = 444574296, upload-time = "2026-03-09T19:28:27.751Z" },
 
773
  version = "12.0.0.61"
774
  source = { registry = "https://pypi.org/simple" }
775
  dependencies = [
776
+ { name = "nvidia-nvjitlink", marker = "sys_platform != 'emscripten' and sys_platform != 'win32'" },
777
  ]
778
  wheels = [
779
  { url = "https://files.pythonhosted.org/packages/8b/ae/f417a75c0259e85c1d2f83ca4e960289a5f814ed0cea74d18c353d3e989d/nvidia_cufft-12.0.0.61-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:2708c852ef8cd89d1d2068bdbece0aa188813a0c934db3779b9b1faa8442e5f5", size = 214053554, upload-time = "2025-09-04T08:31:38.196Z" },
 
803
  version = "12.0.4.66"
804
  source = { registry = "https://pypi.org/simple" }
805
  dependencies = [
806
+ { name = "nvidia-cublas", marker = "sys_platform != 'emscripten' and sys_platform != 'win32'" },
807
+ { name = "nvidia-cusparse", marker = "sys_platform != 'emscripten' and sys_platform != 'win32'" },
808
+ { name = "nvidia-nvjitlink", marker = "sys_platform != 'emscripten' and sys_platform != 'win32'" },
809
  ]
810
  wheels = [
811
  { url = "https://files.pythonhosted.org/packages/c8/c3/b30c9e935fc01e3da443ec0116ed1b2a009bb867f5324d3f2d7e533e776b/nvidia_cusolver-12.0.4.66-py3-none-manylinux_2_27_aarch64.whl", hash = "sha256:02c2457eaa9e39de20f880f4bd8820e6a1cfb9f9a34f820eb12a155aa5bc92d2", size = 223467760, upload-time = "2025-09-04T08:33:04.222Z" },
 
817
  version = "12.6.3.3"
818
  source = { registry = "https://pypi.org/simple" }
819
  dependencies = [
820
+ { name = "nvidia-nvjitlink", marker = "sys_platform != 'emscripten' and sys_platform != 'win32'" },
821
  ]
822
  wheels = [
823
  { url = "https://files.pythonhosted.org/packages/f8/94/5c26f33738ae35276672f12615a64bd008ed5be6d1ebcb23579285d960a9/nvidia_cusparse-12.6.3.3-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:80bcc4662f23f1054ee334a15c72b8940402975e0eab63178fc7e670aa59472c", size = 162155568, upload-time = "2025-09-04T08:33:42.864Z" },