sarthak saxena commited on
Commit
17461d1
·
verified ·
1 Parent(s): f5270e9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +559 -7
README.md CHANGED
@@ -1,10 +1,562 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
- title: README
3
- emoji: 🐨
4
- colorFrom: yellow
5
- colorTo: gray
6
- sdk: static
7
- pinned: false
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  ---
9
 
10
- Edit this `README.md` markdown file to author your organization card.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # llmpm — LLM Package Manager
2
+
3
+ > Command-line package manager for open-sourced large language models. Download and run 10,000+ models, and share LLMs with a single command.
4
+
5
+ `llmpm` is a CLI package manager for large language models, inspired by pip and npm. Your command line hub for open-source LLMs. We’ve done the heavy lifting so you can discover, install, and run models instantly.
6
+
7
+ Models are sourced from [HuggingFace Hub](https://huggingface.co), [Ollama](https://ollama.com/search) & [Mistral AI](https://docs.mistral.ai/getting-started/models).
8
+
9
+ **Explore a Suite of Models at [llmpm.co](https://llmpm.co/models) →**
10
+
11
+ Supports:
12
+
13
+ - Text generation (GGUF via llama.cpp and Transformer checkpoints)
14
+ - Image generation (Diffusion models)
15
+ - Vision models
16
+ - Speech-to-text (ASR)
17
+ - Text-to-speech (TTS)
18
+
19
+ ---
20
+
21
+ ## Installation
22
+
23
+ ### via pip (recommended)
24
+
25
+ ```sh
26
+ pip install llmpm
27
+ ```
28
+
29
+ The pip install is intentionally lightweight — it only installs the CLI tools needed to bootstrap. On first run, `llmpm` automatically creates an isolated environment at `~/.llmpm/venv` and installs all ML backends into it, keeping your system Python untouched.
30
+
31
+ ### via npm
32
+
33
+ ```sh
34
+ npm install -g llmpm
35
+ ```
36
+
37
+ The npm package finds Python on your PATH, creates `~/.llmpm/venv`, and installs all backends into it during `postinstall`.
38
+
39
+ ### Environment isolation
40
+
41
+ All `llmpm` commands always run inside `~/.llmpm/venv`.
42
+ Set `LLPM_NO_VENV=1` to bypass this (useful in CI or Docker where isolation is already provided).
43
+
44
+ ---
45
+
46
+ ## Quick start
47
+
48
+ ```sh
49
+ # Install a model
50
+ llmpm install meta-llama/Llama-3.2-3B-Instruct
51
+
52
+ # Run it
53
+ llmpm run meta-llama/Llama-3.2-3B-Instruct
54
+ llmpm serve meta-llama/Llama-3.2-3B-Instruct
55
+ ```
56
+
57
+ ![llmpm demo](https://res.cloudinary.com/dehc0rbua/image/upload/v1772781378/LLMPMDemo_fuckwk.gif)
58
+
59
+ ---
60
+
61
+ ## Commands
62
+
63
+ | Command | Description |
64
+ | ------------------------------- | --------------------------------------------------------------- |
65
+ | `llmpm init` | Initialise a `llmpm.json` in the current directory |
66
+ | `llmpm install` | Install all models listed in `llmpm.json` |
67
+ | `llmpm install <repo>` | Download and install a model from HuggingFace, Ollama & Mistral |
68
+ | `llmpm run <repo>` | Run an installed model (interactive chat) |
69
+ | `llmpm serve [repo] [repo] ...` | Serve one or more models as an OpenAI-compatible API |
70
+ | `llmpm serve` | Serve every installed model on a single HTTP server |
71
+ | `llmpm push <repo>` | Upload a model to HuggingFace Hub |
72
+ | `llmpm list` | Show all installed models |
73
+ | `llmpm info <repo>` | Show details about a model |
74
+ | `llmpm uninstall <repo>` | Uninstall a model |
75
+ | `llmpm clean` | Remove the managed environment (`~/.llmpm/venv`) |
76
+ | `llmpm clean --all` | Remove environment + all downloaded models and registry |
77
+
78
+ ---
79
+
80
+ ## Local vs global mode
81
+
82
+ `llmpm` works in two modes depending on whether a `llmpm.json` file is present.
83
+
84
+ ### Global mode (default)
85
+
86
+ All models are stored in `~/.llmpm/models/` and the registry lives at
87
+ `~/.llmpm/registry.json`. This is the default when no `llmpm.json` is found.
88
+
89
+ ### Local mode
90
+
91
+ When a `llmpm.json` exists in the current directory (or any parent), llmpm
92
+ switches to **local mode**: models are stored in `.llmpm/models/` next to the
93
+ manifest file. This keeps project models isolated from your global environment.
94
+
95
+ ```
96
+ my-project/
97
+ ├── llmpm.json ← manifest
98
+ └── .llmpm/ ← local model store (auto-created)
99
+ ├── registry.json
100
+ └── models/
101
+ ```
102
+
103
+ All commands (`install`, `run`, `serve`, `list`, `info`, `uninstall`) automatically
104
+ detect the mode and operate on the correct store — no flags required.
105
+
106
+ ---
107
+
108
+ ## `llmpm init`
109
+
110
+ Initialise a new project manifest in the current directory.
111
+
112
+ ```sh
113
+ llmpm init # interactive prompts for name & description
114
+ llmpm init --yes # skip prompts, use directory name as package name
115
+ ```
116
+
117
+ This creates a `llmpm.json`:
118
+
119
+ ```json
120
+ {
121
+ "name": "my-project",
122
+ "description": "",
123
+ "dependencies": {}
124
+ }
125
+ ```
126
+
127
+ Models are listed under `dependencies` without version pins — llmpm models
128
+ don't use semver. The value is always `"*"`.
129
+
130
+ ---
131
+
132
+ ## `llmpm install`
133
+
134
+ ```sh
135
+ # Install a Transformer model
136
+ llmpm install meta-llama/Llama-3.2-3B-Instruct
137
+
138
+ # Install a GGUF model (interactive quantisation picker)
139
+ llmpm install unsloth/Llama-3.2-3B-Instruct-GGUF
140
+
141
+ # Install a specific GGUF quantisation
142
+ llmpm install unsloth/Llama-3.2-3B-Instruct-GGUF --quant Q4_K_M
143
+
144
+ # Install a single specific file
145
+ llmpm install unsloth/Llama-3.2-3B-Instruct-GGUF --file Llama-3.2-3B-Instruct-Q4_K_M.gguf
146
+
147
+ # Skip prompts (pick best default)
148
+ llmpm install meta-llama/Llama-3.2-3B-Instruct --no-interactive
149
+
150
+ # Install and record in llmpm.json (local projects)
151
+ llmpm install meta-llama/Llama-3.2-3B-Instruct --save
152
+
153
+ # Install all models listed in llmpm.json (like npm install)
154
+ llmpm install
155
+ ```
156
+
157
+ In **global mode** models are stored in `~/.llmpm/models/`.
158
+ In **local mode** (when `llmpm.json` is present) they go into `.llmpm/models/`.
159
+
160
+ ### `llmpm install` options
161
+
162
+ | Option | Description |
163
+ | ------------------ | -------------------------------------------------------------- |
164
+ | `--quant` / `-q` | GGUF quantisation to download (e.g. `Q4_K_M`) |
165
+ | `--file` / `-f` | Download a specific file from the repo |
166
+ | `--no-interactive` | Never prompt; pick the best default quantisation automatically |
167
+ | `--save` | Add the model to `llmpm.json` dependencies after installing |
168
+
169
+ ---
170
+
171
+ ## `llmpm run`
172
+
173
+ `llmpm run` auto-detects the model type and launches the appropriate interactive session. It supports text generation, image generation, vision, speech-to-text (ASR), and text-to-speech (TTS) models.
174
+
175
+ ![llmpm run](https://res.cloudinary.com/dehc0rbua/image/upload/v1772781378/LLMPMrunprompt_vc72qd.gif)
176
+
177
+ ### Text generation (GGUF & Transformers)
178
+
179
+ ```sh
180
+ # Interactive chat
181
+ llmpm run meta-llama/Llama-3.2-3B-Instruct
182
+
183
+ # Single-turn inference
184
+ llmpm run meta-llama/Llama-3.2-3B-Instruct --prompt "Explain quantum computing"
185
+
186
+ # With a system prompt
187
+ llmpm run meta-llama/Llama-3.2-3B-Instruct --system "You are a helpful pirate."
188
+
189
+ # Limit response length
190
+ llmpm run meta-llama/Llama-3.2-3B-Instruct --max-tokens 512
191
+
192
+ # GGUF model — tune context window and GPU layers
193
+ llmpm run unsloth/Llama-3.2-3B-Instruct-GGUF --ctx 8192 --gpu-layers 32
194
+ ```
195
+
196
+ ### Image generation (Diffusion)
197
+
198
+ Generates an image from a text prompt and saves it as a PNG on your Desktop.
199
+
200
+ ```sh
201
+ # Single prompt → saves llmpm_<timestamp>.png to ~/Desktop
202
+ llmpm run amused/amused-256 --prompt "a cyberpunk city at sunset"
203
+
204
+ # Interactive session (type a prompt, get an image each time)
205
+ llmpm run amused/amused-256
206
+ ```
207
+
208
+ In interactive mode type your prompt and press Enter. The output path is printed after each generation. Type `/exit` to quit.
209
+
210
+ > Requires: `pip install diffusers torch accelerate`
211
+
212
+ ### Vision (image-to-text)
213
+
214
+ Describe or answer questions about an image. Pass the image file path via `--prompt`.
215
+
216
+ ```sh
217
+ # Single image description
218
+ llmpm run Salesforce/blip-image-captioning-base --prompt /path/to/photo.jpg
219
+
220
+ # Interactive session: type an image path at each prompt
221
+ llmpm run Salesforce/blip-image-captioning-base
222
+ ```
223
+
224
+ > Requires: `pip install transformers torch Pillow`
225
+
226
+ ### Speech-to-text / ASR
227
+
228
+ Transcribe an audio file. Pass the audio file path via `--prompt`.
229
+
230
+ ```sh
231
+ # Transcribe a single file
232
+ llmpm run openai/whisper-base --prompt recording.wav
233
+
234
+ # Interactive: enter an audio file path at each prompt
235
+ llmpm run openai/whisper-base
236
+ ```
237
+
238
+ Supported formats depend on your installed audio libraries (wav, flac, mp3, …).
239
+
240
+ > Requires: `pip install transformers torch`
241
+
242
+ ### Text-to-speech / TTS
243
+
244
+ Convert text to speech. The output WAV file is saved to your Desktop.
245
+
246
+ ```sh
247
+ # Single utterance → saves llmpm_<timestamp>.wav to ~/Desktop
248
+ llmpm run suno/bark-small --prompt "Hello, how are you today?"
249
+
250
+ # Interactive session
251
+ llmpm run suno/bark-small
252
+ ```
253
+
254
+ > Requires: `pip install transformers torch`
255
+
256
+ ### `llmpm run` options
257
+
258
+ | Option | Default | Description |
259
+ | ----------------- | -------- | ------------------------------------------------------- |
260
+ | `--prompt` / `-p` | — | Single-turn prompt or input file path (non-interactive) |
261
+ | `--system` / `-s` | — | System prompt (text generation only) |
262
+ | `--max-tokens` | `128000` | Maximum tokens to generate per response |
263
+ | `--ctx` | `128000` | Context window size (GGUF only) |
264
+ | `--gpu-layers` | `-1` | GPU layers to offload, `-1` = all (GGUF only) |
265
+ | `--verbose` | off | Show model loading output |
266
+
267
+ ### Interactive session commands
268
+
269
+ These commands work in any interactive session:
270
+
271
+ | Command | Action |
272
+ | ---------------- | ------------------------------------------ |
273
+ | `/exit` | End the session |
274
+ | `/clear` | Clear conversation history (text gen only) |
275
+ | `/system <text>` | Update the system prompt (text gen only) |
276
+
277
+ ### Model type detection
278
+
279
+ `llmpm run` reads `config.json` / `model_index.json` from the installed model to determine the pipeline type before loading any weights. The detected type is printed at startup:
280
+
281
+ ```
282
+ Detected: Image Generation (Diffusion)
283
+ Loading model… ✓
284
+ ```
285
+
286
+ If detection is ambiguous the model falls back to the text-generation backend.
287
+
288
+ ---
289
+
290
+ ## `llmpm serve`
291
+
292
+ Start a **single** local HTTP server exposing one or more models as an OpenAI-compatible REST API.
293
+ A browser-based chat UI is available at `/chat`.
294
+
295
+ ![llmpm serve](https://res.cloudinary.com/dehc0rbua/image/upload/v1772781377/LLMPMservemultimodels_m5ahlv.gif)
296
+
297
+ ```sh
298
+ # Serve a single model on the default port (8080)
299
+ llmpm serve meta-llama/Llama-3.2-3B-Instruct
300
+
301
+ # Serve multiple models on one server
302
+ llmpm serve meta-llama/Llama-3.2-3B-Instruct amused/amused-256
303
+
304
+ # Serve ALL installed models automatically
305
+ llmpm serve
306
+
307
+ # Custom port and host
308
+ llmpm serve meta-llama/Llama-3.2-3B-Instruct --port 9000 --host 0.0.0.0
309
+
310
+ # Set the default max tokens (clients may override per-request)
311
+ llmpm serve meta-llama/Llama-3.2-3B-Instruct --max-tokens 2048
312
+
313
+ # GGUF model — tune context window and GPU layers
314
+ llmpm serve unsloth/Llama-3.2-3B-Instruct-GGUF --ctx 8192 --gpu-layers 32
315
+ ```
316
+
317
+ Fuzzy model-name matching is applied to each argument — if multiple installed models match you will be prompted to pick one.
318
+
319
+ ### `llmpm serve` options
320
+
321
+ | Option | Default | Description |
322
+ | --------------- | ----------- | --------------------------------------------------------- |
323
+ | `--port` / `-p` | `8080` | Port to listen on (auto-increments if busy) |
324
+ | `--host` / `-H` | `localhost` | Host/address to bind to |
325
+ | `--max-tokens` | `128000` | Default max tokens per response (overridable per-request) |
326
+ | `--ctx` | `128000` | Context window size (GGUF only) |
327
+ | `--gpu-layers` | `-1` | GPU layers to offload, `-1` = all (GGUF only) |
328
+
329
+ ### Multi-model routing
330
+
331
+ When multiple models are loaded, POST endpoints accept an optional `"model"` field in the JSON body.
332
+ If omitted, the first loaded model is used.
333
+
334
+ ```sh
335
+ # Target a specific model when multiple are loaded
336
+ curl -X POST http://localhost:8080/v1/chat/completions \
337
+ -H "Content-Type: application/json" \
338
+ -d '{"model": "meta-llama/Llama-3.2-3B-Instruct",
339
+ "messages": [{"role": "user", "content": "Hello!"}]}'
340
+ ```
341
+
342
+ The chat UI at `/chat` shows a model dropdown when more than one model is loaded.
343
+ Switching models resets the conversation and adapts the UI to the new model's category.
344
+
345
+ ### Endpoints
346
+
347
+ | Method | Path | Description |
348
+ | ------ | -------------------------- | -------------------------------------------------------------------- |
349
+ | `GET` | `/chat` | Browser chat / image-gen UI (model dropdown for multi-model serving) |
350
+ | `GET` | `/health` | `{"status":"ok","models":["id1","id2",…]}` |
351
+ | `GET` | `/v1/models` | List all loaded models with id, category, created |
352
+ | `GET` | `/v1/models/<id>` | Info for a specific loaded model |
353
+ | `POST` | `/v1/chat/completions` | OpenAI-compatible chat inference (SSE streaming supported) |
354
+ | `POST` | `/v1/completions` | Legacy text completion |
355
+ | `POST` | `/v1/embeddings` | Text embeddings |
356
+ | `POST` | `/v1/images/generations` | Text-to-image; pass `"image"` (base64) for image-to-image |
357
+ | `POST` | `/v1/audio/transcriptions` | Speech-to-text |
358
+ | `POST` | `/v1/audio/speech` | Text-to-speech |
359
+
360
+ All POST endpoints accept `"model": "<id>"` to target a specific loaded model.
361
+
362
+ ### Example API calls
363
+
364
+ ```sh
365
+ # Text generation (streaming)
366
+ curl -X POST http://localhost:8080/v1/chat/completions \
367
+ -H "Content-Type: application/json" \
368
+ -d '{"messages": [{"role": "user", "content": "Hello!"}],
369
+ "max_tokens": 256, "stream": true}'
370
+
371
+ # Target a specific model when multiple are loaded
372
+ curl -X POST http://localhost:8080/v1/chat/completions \
373
+ -H "Content-Type: application/json" \
374
+ -d '{"model": "meta-llama/Llama-3.2-1B-Instruct",
375
+ "messages": [{"role": "user", "content": "Hello!"}]}'
376
+
377
+ # List all loaded models
378
+ curl http://localhost:8080/v1/models
379
+
380
+ # Text-to-image
381
+ curl -X POST http://localhost:8080/v1/images/generations \
382
+ -H "Content-Type: application/json" \
383
+ -d '{"prompt": "a cat in a forest", "n": 1}'
384
+
385
+ # Image-to-image (include the source image as base64 in the same endpoint)
386
+ IMAGE_B64=$(base64 -i input.png)
387
+ curl -X POST http://localhost:8080/v1/images/generations \
388
+ -H "Content-Type: application/json" \
389
+ -d "{\"prompt\": \"turn it into a painting\", \"image\": \"$IMAGE_B64\"}"
390
+
391
+ # Speech-to-text
392
+ curl -X POST http://localhost:8080/v1/audio/transcriptions \
393
+ -H "Content-Type: application/octet-stream" \
394
+ --data-binary @recording.wav
395
+
396
+ # Text-to-speech
397
+ curl -X POST http://localhost:8080/v1/audio/speech \
398
+ -H "Content-Type: application/json" \
399
+ -d '{"input": "Hello world"}' \
400
+ --output speech.wav
401
+ ```
402
+
403
+ Response shape for chat completions (non-streaming):
404
+
405
+ ```json
406
+ {
407
+ "object": "chat.completion",
408
+ "model": "<model-id>",
409
+ "choices": [{
410
+ "index": 0,
411
+ "message": { "role": "assistant", "content": "<text>" },
412
+ "finish_reason": "stop"
413
+ }],
414
+ "usage": { "prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0 }
415
+ }
416
+ ```
417
+
418
+ Response shape for chat completions (streaming SSE):
419
+
420
+ Each chunk:
421
+ ```json
422
+ {
423
+ "object": "chat.completion.chunk",
424
+ "model": "<model-id>",
425
+ "choices": [{
426
+ "index": 0,
427
+ "delta": { "content": "<token>" },
428
+ "finish_reason": null
429
+ }]
430
+ }
431
+ ```
432
+
433
+ Followed by a final `data: [DONE]` sentinel.
434
+
435
+ Response shape for image generation:
436
+
437
+ ```json
438
+ {
439
+ "created": 1234567890,
440
+ "data": [{ "b64_json": "<base64-png>" }]
441
+ }
442
+ ```
443
+
444
  ---
445
+
446
+ ## `llmpm push`
447
+
448
+ ```sh
449
+ # Push an already-installed model
450
+ llmpm push my-org/my-fine-tune
451
+
452
+ # Push a local directory
453
+ llmpm push my-org/my-fine-tune --path ./my-model-dir
454
+
455
+ # Push as private repository
456
+ llmpm push my-org/my-fine-tune --private
457
+
458
+ # Custom commit message
459
+ llmpm push my-org/my-fine-tune -m "Add Q4_K_M quantisation"
460
+ ```
461
+
462
+ Requires a HuggingFace token (run `huggingface-cli login` or set `HF_TOKEN`).
463
+
464
  ---
465
 
466
+ ## Backends
467
+
468
+ All backends (torch, transformers, diffusers, llama-cpp-python, …) are included in `pip install llmpm` by default and are installed into the managed `~/.llmpm/venv`.
469
+
470
+ | Model type | Pipeline | Backend |
471
+ | ----------------------- | ---------------- | ------------------------------ |
472
+ | `.gguf` files | Text generation | llama.cpp via llama-cpp-python |
473
+ | `.safetensors` / `.bin` | Text generation | HuggingFace Transformers |
474
+ | Diffusion models | Image generation | HuggingFace Diffusers |
475
+ | Vision models | Image-to-text | HuggingFace Transformers |
476
+ | Whisper / ASR models | Speech-to-text | HuggingFace Transformers |
477
+ | TTS models | Text-to-speech | HuggingFace Transformers |
478
+
479
+ ### Selective backend install
480
+
481
+ If you only need one backend (e.g. on a headless server), install without defaults and add just what you need:
482
+
483
+ ```sh
484
+ pip install llmpm --no-deps # CLI only (no ML backends)
485
+ pip install llmpm[gguf] # + GGUF / llama.cpp
486
+ pip install llmpm[transformers] # + text generation
487
+ pip install llmpm[diffusion] # + image generation
488
+ pip install llmpm[vision] # + vision / image-to-text
489
+ pip install llmpm[audio] # + ASR + TTS
490
+ ```
491
+
492
+ ---
493
+
494
+ ## Configuration
495
+
496
+ | Variable | Default | Description |
497
+ | -------------- | ---------- | ------------------------------------------------------------ |
498
+ | `LLMPM_HOME` | `~/.llmpm` | Root directory for models and registry |
499
+ | `HF_TOKEN` | — | HuggingFace API token for gated models |
500
+ | `LLPM_PYTHON` | `python3` | Python binary used by the npm shim (fallback only) |
501
+ | `LLPM_NO_VENV` | — | Set to `1` to skip venv isolation (CI / Docker / containers) |
502
+
503
+ ### Configuration examples
504
+
505
+ **Use a HuggingFace token for gated models:**
506
+
507
+ ```sh
508
+ HF_TOKEN=hf_your_token llmpm install meta-llama/Llama-3.2-3B-Instruct
509
+ # or export for the session
510
+ export HF_TOKEN=hf_your_token
511
+ llmpm install meta-llama/Llama-3.2-3B-Instruct
512
+ ```
513
+
514
+ **Skip venv isolation (CI / Docker):**
515
+
516
+ ```sh
517
+ # Inline — single command
518
+ LLPM_NO_VENV=1 llmpm serve meta-llama/Llama-3.2-3B-Instruct
519
+
520
+ # Exported — all subsequent commands skip the venv
521
+ export LLPM_NO_VENV=1
522
+ llmpm install meta-llama/Llama-3.2-3B-Instruct
523
+ llmpm serve meta-llama/Llama-3.2-3B-Instruct
524
+ ```
525
+
526
+ > When using `LLPM_NO_VENV=1`, install all backends first: `pip install llmpm[all]`
527
+
528
+ **Custom model storage location:**
529
+
530
+ ```sh
531
+ LLMPM_HOME=/mnt/models llmpm install meta-llama/Llama-3.2-3B-Instruct
532
+ LLMPM_HOME=/mnt/models llmpm serve meta-llama/Llama-3.2-3B-Instruct
533
+ ```
534
+
535
+ **Use a specific Python binary (npm installs):**
536
+
537
+ ```sh
538
+ LLPM_PYTHON=/usr/bin/python3.11 llmpm run meta-llama/Llama-3.2-3B-Instruct
539
+ ```
540
+
541
+ **Combining variables:**
542
+
543
+ ```sh
544
+ HF_TOKEN=hf_your_token LLMPM_HOME=/data/models LLPM_NO_VENV=1 \
545
+ llmpm install meta-llama/Llama-3.2-3B-Instruct
546
+ ```
547
+
548
+ **Docker / CI example:**
549
+
550
+ ```dockerfile
551
+ ENV LLPM_NO_VENV=1
552
+ ENV HF_TOKEN=hf_your_token
553
+ RUN pip install llmpm[all]
554
+ RUN llmpm install meta-llama/Llama-3.2-3B-Instruct
555
+ CMD ["llmpm", "serve", "meta-llama/Llama-3.2-3B-Instruct", "--host", "0.0.0.0"]
556
+ ```
557
+
558
+ ---
559
+
560
+ ## License
561
+
562
+ MIT