Instructions to use solvrays/mdf-form-reader-phi35-vision with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use solvrays/mdf-form-reader-phi35-vision with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "image-to-text" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("image-to-text", model="solvrays/mdf-form-reader-phi35-vision")# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("solvrays/mdf-form-reader-phi35-vision") model = AutoModelForImageTextToText.from_pretrained("solvrays/mdf-form-reader-phi35-vision") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Unsloth Studio new
How to use solvrays/mdf-form-reader-phi35-vision with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for solvrays/mdf-form-reader-phi35-vision to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for solvrays/mdf-form-reader-phi35-vision to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for solvrays/mdf-form-reader-phi35-vision to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="solvrays/mdf-form-reader-phi35-vision", max_seq_length=2048, )
MDF Form Reader β Phi-3.5-Vision Fine-tuned
Vision-native handwritten insurance form understanding, fine-tuned from microsoft/Phi-3.5-vision-instruct using QLoRA.
No OCR needed. This model reads handwriting, checks checkbox states, and extracts structured data directly from scanned MDF (Monthly Disability Verification) form images.
π Model Summary
| Property | Value |
|---|---|
| Base Model | microsoft/Phi-3.5-vision-instruct (4.2B) |
| Task | Visual Question Answering on MDF forms |
| Fine-tuning Method | QLoRA (r=16, alpha=32) via Unsloth |
| Quantization | 4-bit NF4 (training) β 16-bit merged |
| Annotator | Vertex AI Gemini 2.5 Flash |
| Exact Match | 0% |
| OOD Refusal Rate | 0% |
| License | Apache 2.0 |
π Quick Start
from transformers import AutoModelForCausalLM, AutoProcessor
from PIL import Image
import torch
model_id = "solvrays/mdf-form-reader-phi35-vision"
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="cuda",
trust_remote_code=True,
)
# Load your scanned MDF form image
image = Image.open("mdf_form.png").convert("RGB")
# Ask a question about the form
question = "What is the name of the physician who signed this form?"
messages = [{"role": "user", "content": f"<|image_1|>
{question}"}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to("cuda")
with torch.no_grad():
out = model.generate(**inputs, max_new_tokens=200, temperature=0.1)
answer = processor.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(answer)
π₯ What is an MDF Form?
A Monthly Disability Verification Form (Form 441.O.MDF.O) is issued by TriPlus Services, acting as Third-Party Administrator of Penn Treaty Network America and American Network policies. It requires a licensed physician to certify a patient's ongoing disability status monthly.
Key Fields Extracted
- Physician name, address, phone, fax
- Submission date range (from / to)
- Patient disability status (YES checked / NO checked)
- Disability end date (if applicable)
- Form completion date
- Physician signature presence
π¬ Why Vision-Native vs OCR?
| Challenge | OCR Approach | This Model |
|---|---|---|
| Cursive physician names | Fails ("Carnazzo", "Kruszka") | Reads directly from image |
| Checkbox state (YES/NO) | Misses (no text to extract) | Sees the β/β mark in context |
| Date grid cells (MM/DD/YYYY) | Digit confusion in small boxes | Layout-aware reading |
| Signature field | Garbage output | Correctly ignored |
| Handwritten addresses | High error rate | Contextual correction |
π οΈ Training Pipeline
Scanned MDF Form (PDF)
β Image pre-processing (deskew 300 DPI, bilateral denoise, CLAHE)
β Vertex AI Gemini 2.5 Flash β structured JSON annotation
β VQA triplet dataset (field extraction + OOD refusal pairs)
β Phi-3.5-Vision + QLoRA (Unsloth, 2-5Γ faster, 80% less VRAM)
β Merge adapters β full 16-bit model
β HuggingFace Hub (safetensors)
Training Configuration
base_model: microsoft/Phi-3.5-vision-instruct
fine_tuning_method: QLoRA (NF4, double quantization)
lora_rank: 16
lora_alpha: 32
lora_dropout: 0.05
use_rslora: true
vision_layers: frozen
language_layers: adapted
optimizer: AdamW 8-bit (paged)
lr_scheduler: cosine
neftune_noise_alpha: 5
annotator: Vertex AI Gemini 2.5 Flash
framework: Unsloth + HuggingFace TRL
π Evaluation Results
| Metric | Value |
|---|---|
| Exact Match (field extraction) | 0% |
| OOD Refusal Rate | 0% |
| Evaluation Set | Held-out MDF form pages |
OOD Refusal Rate measures how reliably the model declines to answer questions not answerable from the form (e.g. "What is the diagnosis?", "Has this claim been approved?").
β οΈ Limitations
- Domain-specific: Trained exclusively on TriPlus Services MDF forms. Performance on other form types is not guaranteed.
- Image quality: Works best on scans β₯ 300 DPI. Very low-resolution or heavily degraded scans may reduce accuracy.
- Language: English only.
- Redacted fields: Returns
nullfor blacked-out fields (insured name/policy number). - Not for medical diagnosis: This model extracts administrative form data only.
π License
This model is released under the Apache 2.0 License. The base model (microsoft/Phi-3.5-vision-instruct) is also Apache 2.0.
π Acknowledgements
- Unsloth for 2-5Γ faster fine-tuning
- Microsoft Phi-3.5-Vision for the base vision-language model
- Vertex AI Gemini 2.5 Flash for dataset annotation
- HuggingFace TRL for SFTTrainer
- Downloads last month
- 17
Model tree for solvrays/mdf-form-reader-phi35-vision
Base model
microsoft/Phi-3.5-vision-instructEvaluation results
- Exact Match (%)self-reported0.000
- OOD Refusal Rate (%)self-reported0.000