tiny_vllm
— minimal continuous-batching engine
connecting…
Explain paged attention in two sentences.
max_tokens
temperature
top_p
Send
Send ×2 (prefix demo)
0.5×
1×
2×
4×
8×
Pause
Restart
Block pool
free
cached (evictable)
in use
shared (refcount>1)
hashed (border)
Scheduler
tokens this step
0
prefill / decode
0 / 0
step (ms)
0
prefix cache hit-rate
0%
free blocks
0
preemptions (total)
0
step log
Sequences