MDF Form Reader — Phi-3.5-Vision Fine-tuned

Vision-native handwritten insurance form understanding, fine-tuned from microsoft/Phi-3.5-vision-instruct using QLoRA.

No OCR needed. This model reads handwriting, checks checkbox states, and extracts structured data directly from scanned MDF (Monthly Disability Verification) form images.

📋 Model Summary

Property	Value
Base Model	`microsoft/Phi-3.5-vision-instruct` (4.2B)
Task	Visual Question Answering on MDF forms
Fine-tuning Method	QLoRA (r=16, alpha=32) via Unsloth
Quantization	4-bit NF4 (training) → 16-bit merged
Annotator	Vertex AI Gemini 2.5 Flash
Exact Match	0%
OOD Refusal Rate	0%
License	Apache 2.0

🚀 Quick Start

from transformers import AutoModelForCausalLM, AutoProcessor
from PIL import Image
import torch

model_id = "solvrays/mdf-form-reader-phi35-vision"

processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="cuda",
    trust_remote_code=True,
)

# Load your scanned MDF form image
image = Image.open("mdf_form.png").convert("RGB")

# Ask a question about the form
question = "What is the name of the physician who signed this form?"

messages = [{"role": "user", "content": f"<|image_1|>
{question}"}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = processor(text=[text], images=[image], return_tensors="pt").to("cuda")

with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=200, temperature=0.1)

answer = processor.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(answer)

🏥 What is an MDF Form?

A Monthly Disability Verification Form (Form 441.O.MDF.O) is issued by TriPlus Services, acting as Third-Party Administrator of Penn Treaty Network America and American Network policies. It requires a licensed physician to certify a patient's ongoing disability status monthly.

Key Fields Extracted

Physician name, address, phone, fax
Submission date range (from / to)
Patient disability status (YES checked / NO checked)
Disability end date (if applicable)
Form completion date
Physician signature presence

🔬 Why Vision-Native vs OCR?

Challenge	OCR Approach	This Model
Cursive physician names	Fails ("Carnazzo", "Kruszka")	Reads directly from image
Checkbox state (YES/NO)	Misses (no text to extract)	Sees the ✓/✗ mark in context
Date grid cells (MM/DD/YYYY)	Digit confusion in small boxes	Layout-aware reading
Signature field	Garbage output	Correctly ignored
Handwritten addresses	High error rate	Contextual correction

🛠️ Training Pipeline

Scanned MDF Form (PDF)
    ↓ Image pre-processing (deskew 300 DPI, bilateral denoise, CLAHE)
    ↓ Vertex AI Gemini 2.5 Flash → structured JSON annotation
    ↓ VQA triplet dataset (field extraction + OOD refusal pairs)
    ↓ Phi-3.5-Vision + QLoRA (Unsloth, 2-5× faster, 80% less VRAM)
    ↓ Merge adapters → full 16-bit model
    ↓ HuggingFace Hub (safetensors)

Training Configuration

base_model: microsoft/Phi-3.5-vision-instruct
fine_tuning_method: QLoRA (NF4, double quantization)
lora_rank: 16
lora_alpha: 32
lora_dropout: 0.05
use_rslora: true
vision_layers: frozen
language_layers: adapted
optimizer: AdamW 8-bit (paged)
lr_scheduler: cosine
neftune_noise_alpha: 5
annotator: Vertex AI Gemini 2.5 Flash
framework: Unsloth + HuggingFace TRL

📊 Evaluation Results

Metric	Value
Exact Match (field extraction)	0%
OOD Refusal Rate	0%
Evaluation Set	Held-out MDF form pages

OOD Refusal Rate measures how reliably the model declines to answer questions not answerable from the form (e.g. "What is the diagnosis?", "Has this claim been approved?").

⚠️ Limitations

Domain-specific: Trained exclusively on TriPlus Services MDF forms. Performance on other form types is not guaranteed.
Image quality: Works best on scans ≥ 300 DPI. Very low-resolution or heavily degraded scans may reduce accuracy.
Language: English only.
Redacted fields: Returns null for blacked-out fields (insured name/policy number).
Not for medical diagnosis: This model extracts administrative form data only.

📄 License

This model is released under the Apache 2.0 License. The base model (microsoft/Phi-3.5-vision-instruct) is also Apache 2.0.

🙏 Acknowledgements

Unsloth for 2-5× faster fine-tuning
Microsoft Phi-3.5-Vision for the base vision-language model
Vertex AI Gemini 2.5 Flash for dataset annotation
HuggingFace TRL for SFTTrainer

Downloads last month: 17

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for solvrays/mdf-form-reader-phi35-vision

Base model

microsoft/Phi-3.5-vision-instruct

Adapter

(6)

this model

Evaluation results

Exact Match (%)
self-reported

0.000
OOD Refusal Rate (%)
self-reported

0.000