TinyDoc-VLM-256M

256M-parameter document-specialist Vision-Language Model

GitHub PyPI HF Space License

Overview

TinyDoc-VLM is a compact vision-language model specialized for document understanding tasks: OCR, form extraction, table parsing, receipt processing, and visual question answering.

  • 256M params: SigLIP-B/16 vision encoder (93M) + PixelShuffle 3ร— compressor + SmolLM2-135M decoder
  • <1GB VRAM: Runs on MacBook, Raspberry Pi 5, or any CPU with ONNX
  • Apache 2.0: Fully open-source, free for commercial use

Architecture

Image (384ร—384) โ†’ SigLIP-B/16 (93M) โ†’ PixelShuffle 3ร— โ†’ 64 tokens โ†’ SmolLM2-135M โ†’ JSON/KV/Table/OCR/QA

Usage

from tinydoc_vlm import TinyDocVLMForConditionalGeneration, TinyDocVLMProcessor

model = TinyDocVLMForConditionalGeneration.from_pretrained(eulogik/TinyDoc-VLM-256M)
processor = TinyDocVLMProcessor()

LoRA Fine-tuning

Train on your own documents with LoRA (2.7M trainable params, 0.93% of total):

# Generate synthetic docs
python data/synthetic/generator.py --num-docs 1000 --output-dir data/synthetic/output

# Train with LoRA
python training/fast_train.py --steps 5000 --device mps  # M4 Mac
python training/fast_train.py --steps 5000 --device cuda # GPU

See training/colab_train.ipynb for a complete Colab notebook.

ONNX Export

python export/export_onnx.py --model-path eulogik/TinyDoc-VLM-256M --output model.onnx

Benchmarks

Benchmark Status
OCRBench In progress (needs instruction tuning)
DocVQA Pending
FUNSD Pending

Links

Citation

@software{eulogik_tinydoc_vlm_2026,
  author = {eulogik},
  title = {TinyDoc-VLM: 256M-Param Document-Specialist VLM},
  year = {2026},
  url = {https://github.com/eulogik/TinyDoc-VLM}
}

License

Apache 2.0. See LICENSE.

Downloads last month
230
Safetensors
Model size
0.3B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for eulogik/TinyDoc-VLM-256M

Adapters
1 model

Space using eulogik/TinyDoc-VLM-256M 1