Visual Question Answering
PEFT
Safetensors
PyTorch
English
lora
vision-language-model
document-understanding
ocr
tinyml
fine-tuning
huggingface
low-resource
edge-deployment
cpu-inference
small-language-model
document-ai
structured-extraction
Instructions to use eulogik/TinyDoc-VLM-LoRA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use eulogik/TinyDoc-VLM-LoRA with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("eulogik/TinyDoc-VLM-256M") model = PeftModel.from_pretrained(base_model, "eulogik/TinyDoc-VLM-LoRA") - Notebooks
- Google Colab
- Kaggle
TinyDoc-VLM LoRA Checkpoint
Fine-tuned document AI. 2.7M trainable params. 15 hours on a Mac. Loss: 43 β 15.
What is this?
A LoRA adapter for TinyDoc-VLM-256M that fine-tunes the model on document understanding tasks. Only 2.7M params (0.93% of total) are trained, making it efficient to train and deploy.
Quick Start
from tinydoc_vlm import TinyDocVLMForConditionalGeneration, TinyDocVLMProcessor
from peft import PeftModel
# Load base model
model = TinyDocVLMForConditionalGeneration.from_pretrained("eulogik/TinyDoc-VLM-256M")
# Apply LoRA adapter
model = PeftModel.from_pretrained(model, "eulogik/TinyDoc-VLM-LoRA")
# Merge for inference (optional, but faster)
model = model.merge_and_unload()
processor = TinyDocVLMProcessor()
Training Details
| Parameter | Value |
|---|---|
| Base model | eulogik/TinyDoc-VLM-256M |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| Trainable params | 2,727,936 (0.93% of total) |
| Target modules | q_proj, v_proj, k_proj, o_proj |
| Training data | 3,000 synthetic documents (6,815 QA pairs) |
| Training steps | 17,000 |
| Best step | 14,000 (loss: 15.0) |
| Final loss | 17.2 (from 43.3) |
| Hardware | Apple M4 Mac |
| Training time | 15.1 hours |
Training Curve
Step 25: 43.3 ββββββββββββββββββββββββββββββββββββββββββββ
Step 500: 25.7 βββββββββββββββββββββββββββ
Step 1000: 20.9 βββββββββββββββββββββ
Step 5000: 18.6 ββββββββββββββββββ
Step 10000: 16.5 ββββββββββββββββ
Step 14000: 15.0 βββββββββββββββ β
Best
Step 17000: 17.2 βββββββββββββββββ
What This Trains
- Document OCR β Printed text recognition
- Form field extraction β Key-value pair extraction
- Receipt/invoice parsing β Amount, date, vendor extraction
- Table structure understanding β Cell extraction
- Visual question answering β Answer questions about documents
How to Reproduce
# Clone repo
git clone https://github.com/eulogik/TinyDoc-VLM.git
cd TinyDoc-VLM
pip install -e .
# Generate synthetic docs
python data/synthetic/generator.py --num-docs 3000 --output-dir data/synthetic/output
# Train LoRA (17K steps, ~15 hours on M4)
python training/fast_train.py --manifest data/synthetic/output/manifest.jsonl --data-root data/synthetic --steps 17000 --batch-size 1 --grad-accum 4 --device mps
# Or use overnight script
bash training/overnight_train.sh
Next Steps
- Scale to 10K+ documents
- Add public benchmarks (DocVQA, FUNSD, CORD)
- Train for 50K+ steps
- Export to ONNX
Links
| Resource | URL |
|---|---|
| Base Model | eulogik/TinyDoc-VLM-256M |
| GitHub | github.com/eulogik/TinyDoc-VLM |
| Live Demo | huggingface.co/spaces/eulogik/TinyDoc-VLM |
| Training Script | training/fast_train.py |
License
Apache 2.0. Same as base model.
Part of the TinyDoc-VLM project.
- Downloads last month
- -
Model tree for eulogik/TinyDoc-VLM-LoRA
Base model
eulogik/TinyDoc-VLM-256M