slwang-ustc (Shilong Wang)

Hi everyone! 👋

I’ve been working on a fork of Nano-vLLM called Nano-vLLM-v1, which re-engineers the core architecture to closely reproduce the vLLM v1 scheduler and introduces Chunked Prefill for better performance.

The goal was to build a lightweight, readable, yet highly efficient inference engine that stays true to the original vLLM design while being easy to understand and extend.

🔥 Key Features

✅ Fully reproduced vLLM v1 scheduler – Implements the same scheduling logic as vLLM v1.
✅ Chunked Prefill – Improves prefill efficiency for longer contexts.
✅ Clean codebase – The simplest way to reproduce vLLM v1 scheduler and implement Chunked Prefill based on Nano vLLM
✅ Fast offline & online inference – Comparable performance to vLLM v1 in offline throughput and online latency (TTFT and TPOT).

📂 Repository

Check it out here: https://github.com/slwang-ustc/nano-vllm-v1/tree/main

I’d love for the community to try it out, give feedback, or contribute! The code is designed to be readable and modular, making it easy to experiment with new features or optimizations.

If you’re interested in lightweight, high-performance LLM inference without the complexity, give it a star ⭐ and let me know what you think!

🚀 Quick Start

Offline example:

from nanovllm import LLM, SamplingParams

llm = LLM("/path/to/model", enforce_eager=True, tensor_parallel_size=1)
sampling_params = SamplingParams(temperature=0.6, max_tokens=256)
outputs = llm.generate(["Hello, Nano-vLLM."], sampling_params)
print(outputs[0]["text"])

Online benchmarking:

python serving_bench.py \
--model /path/to/Qwen3-14B/ \
--request-rate 10 \
--num-requests 1024 \
--tensor-parallel-size 1 \
--max-num-batched-tokens 1024 \
--max-num-seqs 1024 \
--random-input-len 128 \
--random-output-len 100 \
--chunked-prefill \
--enforce-eager

Shilong Wang

AI & ML interests

Recent Activity

Organizations

🔥 Key Features

📂 Repository

🚀 Quick Start

Shilong Wang

AI & ML interests

Recent Activity

Organizations

slwang-ustc's activity

🔥 Key Features

📂 Repository

🚀 Quick Start