denisko Claude Opus 4.6 commited on
Commit
5ef0391
·
1 Parent(s): 07563d8

Add TODO placeholder for per-request preset selection API

Browse files

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Files changed (1) hide show
  1. README.md +4 -0
README.md CHANGED
@@ -258,6 +258,10 @@ llm.collective_rpc("set_layer_placements", args=(pattern,))
258
  | Supernet, 1 GPU (`enforce_eager`) | ~46 GiB | ~20 GiB | Runtime switching, lower KV capacity |
259
  | Supernet, 2 GPU (TP=2) | ~23 GiB/GPU | ~50 GiB/GPU | Full compile + CUDA graphs |
260
 
 
 
 
 
261
  ## Chat Template
262
 
263
  ```
 
258
  | Supernet, 1 GPU (`enforce_eager`) | ~46 GiB | ~20 GiB | Runtime switching, lower KV capacity |
259
  | Supernet, 2 GPU (TP=2) | ~23 GiB/GPU | ~50 GiB/GPU | Full compile + CUDA graphs |
260
 
261
+ ### Per-Request Preset Selection
262
+
263
+ > **🔴 TODO: Add per-request placement selection via the vLLM serving API (e.g. `placement_id` field in the request body). This is managed separately from the global `collective_rpc` switching shown above.**
264
+
265
  ## Chat Template
266
 
267
  ```