TinyDoc-VLM-256M

256M-parameter document-specialist Vision-Language Model

Overview

TinyDoc-VLM is a compact vision-language model specialized for document understanding tasks: OCR, form extraction, table parsing, receipt processing, and visual question answering.

256M params: SigLIP-B/16 vision encoder (93M) + PixelShuffle 3× compressor + SmolLM2-135M decoder
<1GB VRAM: Runs on MacBook, Raspberry Pi 5, or any CPU with ONNX
Apache 2.0: Fully open-source, free for commercial use

Architecture

Image (384×384) → SigLIP-B/16 (93M) → PixelShuffle 3× → 64 tokens → SmolLM2-135M → JSON/KV/Table/OCR/QA

Usage

from tinydoc_vlm import TinyDocVLMForConditionalGeneration, TinyDocVLMProcessor

model = TinyDocVLMForConditionalGeneration.from_pretrained(eulogik/TinyDoc-VLM-256M)
processor = TinyDocVLMProcessor()

LoRA Fine-tuning

Train on your own documents with LoRA (2.7M trainable params, 0.93% of total):

# Generate synthetic docs
python data/synthetic/generator.py --num-docs 1000 --output-dir data/synthetic/output

# Train with LoRA
python training/fast_train.py --steps 5000 --device mps  # M4 Mac
python training/fast_train.py --steps 5000 --device cuda # GPU

See training/colab_train.ipynb for a complete Colab notebook.

ONNX Export

python export/export_onnx.py --model-path eulogik/TinyDoc-VLM-256M --output model.onnx

Benchmarks

Benchmark	Status
OCRBench	In progress (needs instruction tuning)
DocVQA	Pending
FUNSD	Pending

Citation

@software{eulogik_tinydoc_vlm_2026,
  author = {eulogik},
  title = {TinyDoc-VLM: 256M-Param Document-Specialist VLM},
  year = {2026},
  url = {https://github.com/eulogik/TinyDoc-VLM}
}

License

Apache 2.0. See LICENSE.

Downloads last month: 230

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for eulogik/TinyDoc-VLM-256M

Adapters

1 model

eulogik
/

TinyDoc-VLM-256M