How to use from
Unsloth Studio
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for solvrays/mdf-form-reader-phi35-vision to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for solvrays/mdf-form-reader-phi35-vision to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for solvrays/mdf-form-reader-phi35-vision to start chatting
Load model with FastModel
pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="solvrays/mdf-form-reader-phi35-vision",
    max_seq_length=2048,
)
Quick Links

MDF Form Reader β€” Phi-3.5-Vision Fine-tuned

Vision-native handwritten insurance form understanding, fine-tuned from microsoft/Phi-3.5-vision-instruct using QLoRA.

No OCR needed. This model reads handwriting, checks checkbox states, and extracts structured data directly from scanned MDF (Monthly Disability Verification) form images.


πŸ“‹ Model Summary

Property Value
Base Model microsoft/Phi-3.5-vision-instruct (4.2B)
Task Visual Question Answering on MDF forms
Fine-tuning Method QLoRA (r=16, alpha=32) via Unsloth
Quantization 4-bit NF4 (training) β†’ 16-bit merged
Annotator Vertex AI Gemini 2.5 Flash
Exact Match 0%
OOD Refusal Rate 0%
License Apache 2.0

πŸš€ Quick Start

from transformers import AutoModelForCausalLM, AutoProcessor
from PIL import Image
import torch

model_id = "solvrays/mdf-form-reader-phi35-vision"

processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="cuda",
    trust_remote_code=True,
)

# Load your scanned MDF form image
image = Image.open("mdf_form.png").convert("RGB")

# Ask a question about the form
question = "What is the name of the physician who signed this form?"

messages = [{"role": "user", "content": f"<|image_1|>
{question}"}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = processor(text=[text], images=[image], return_tensors="pt").to("cuda")

with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=200, temperature=0.1)

answer = processor.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(answer)

πŸ₯ What is an MDF Form?

A Monthly Disability Verification Form (Form 441.O.MDF.O) is issued by TriPlus Services, acting as Third-Party Administrator of Penn Treaty Network America and American Network policies. It requires a licensed physician to certify a patient's ongoing disability status monthly.

Key Fields Extracted

  • Physician name, address, phone, fax
  • Submission date range (from / to)
  • Patient disability status (YES checked / NO checked)
  • Disability end date (if applicable)
  • Form completion date
  • Physician signature presence

πŸ”¬ Why Vision-Native vs OCR?

Challenge OCR Approach This Model
Cursive physician names Fails ("Carnazzo", "Kruszka") Reads directly from image
Checkbox state (YES/NO) Misses (no text to extract) Sees the βœ“/βœ— mark in context
Date grid cells (MM/DD/YYYY) Digit confusion in small boxes Layout-aware reading
Signature field Garbage output Correctly ignored
Handwritten addresses High error rate Contextual correction

πŸ› οΈ Training Pipeline

Scanned MDF Form (PDF)
    ↓ Image pre-processing (deskew 300 DPI, bilateral denoise, CLAHE)
    ↓ Vertex AI Gemini 2.5 Flash β†’ structured JSON annotation
    ↓ VQA triplet dataset (field extraction + OOD refusal pairs)
    ↓ Phi-3.5-Vision + QLoRA (Unsloth, 2-5Γ— faster, 80% less VRAM)
    ↓ Merge adapters β†’ full 16-bit model
    ↓ HuggingFace Hub (safetensors)

Training Configuration

base_model: microsoft/Phi-3.5-vision-instruct
fine_tuning_method: QLoRA (NF4, double quantization)
lora_rank: 16
lora_alpha: 32
lora_dropout: 0.05
use_rslora: true
vision_layers: frozen
language_layers: adapted
optimizer: AdamW 8-bit (paged)
lr_scheduler: cosine
neftune_noise_alpha: 5
annotator: Vertex AI Gemini 2.5 Flash
framework: Unsloth + HuggingFace TRL

πŸ“Š Evaluation Results

Metric Value
Exact Match (field extraction) 0%
OOD Refusal Rate 0%
Evaluation Set Held-out MDF form pages

OOD Refusal Rate measures how reliably the model declines to answer questions not answerable from the form (e.g. "What is the diagnosis?", "Has this claim been approved?").


⚠️ Limitations

  • Domain-specific: Trained exclusively on TriPlus Services MDF forms. Performance on other form types is not guaranteed.
  • Image quality: Works best on scans β‰₯ 300 DPI. Very low-resolution or heavily degraded scans may reduce accuracy.
  • Language: English only.
  • Redacted fields: Returns null for blacked-out fields (insured name/policy number).
  • Not for medical diagnosis: This model extracts administrative form data only.

πŸ“„ License

This model is released under the Apache 2.0 License. The base model (microsoft/Phi-3.5-vision-instruct) is also Apache 2.0.


πŸ™ Acknowledgements

Downloads last month
17
Safetensors
Model size
8B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for solvrays/mdf-form-reader-phi35-vision

Adapter
(6)
this model

Evaluation results