Spaces:

LMMs-Lab-Speedrun
/

README

Running

App Files Files Community

README / README.md

Yuwei-Niu

Update README.md

7b3952b verified 11 days ago

preview code

raw

history blame contribute delete

1.92 kB

	---
	title: README
	emoji: ⚡
	colorFrom: purple
	colorTo: pink
	sdk: static
	pinned: false
	---

	# NanoVLM Speedrun
	> The most striking thing about the [modded-nanogpt](https://github.com/karpathy/modded-nanogpt) experiments is that they expose how much of deep learning is just bloat.
	> To apply this to Vision-Language Models (VLMs), you have to stop acting like a researcher and start acting like a hacker. You aren't trying to follow academic standards; you are trying to maximize the movement of bits through silicon.
	We introduce NanoVLM Speedrun: a minimalist VLM recipe designed to strip away the bloat. We provide the bare-minimum components required to bridge the training and evaluation pipeline, enabling lightning-fast iteration and reproduction.

	## The Recipe (2026H1)
	- LLM: [`Qwen/Qwen3-0.6B`](https://huggingface.co/Qwen/Qwen3-0.6B )
	- Vision Encoder: [`google/siglip2-so400m-patch16-naflex`](https://huggingface.co/google/siglip2-so400m-patch16-naflex )
	- Projector: Classic [LLaVA](https://arxiv.org/abs/2310.03744)-style 2-layer MLP
	- Training Paradigm: A streamlined two-stage approach:
	- Stage 1: Projector-only alignment (tuning the projector between vision and language).
	- Stage 2: End-to-end instruction tuning (tuning both the projector and the LLM).

	## Data Preparation
	We utilize the curated [LMMs-Lab-Speedrun/Data_NanoVLM](https://huggingface.co/datasets/LMMs-Lab-Speedrun/Data_NanoVLM ) collection.
	- Stage 1: From [liuhaotian/LLaVA-Pretrain](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain )
	- Stage 2: From [lmms-lab/LLaVA-NeXT-Data](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Data) (Note: We explicitly filtered out excessively long samples to maintain training efficiency).

	For more information about training, please refer to [NanoVLM Speedrun](https://github.com/EvolvingLMMs-Lab/lmms-engine/tree/main/examples/nanovlm).