tiny_vllm — minimal continuous-batching engine

connecting…

max_tokens temperature top_p

Block pool

free cached (evictable) in use shared (refcount>1) hashed (border)

Scheduler

tokens this step

0

prefill / decode

0 / 0

step (ms)

0

prefix cache hit-rate

0%

free blocks

0

preemptions (total)

0

step log

Sequences