Spaces:
Running on Zero
Running on Zero
Extracting Structured Data from Low-Quality Scanned Hospital Bills
#1
by
biswajitggiiygg - opened
Hi everyone,
I’m working on a document AI project to extract structured data from scanned hospital bills collected from multiple hospitals. Most files are low-quality PDFs/images with inconsistent layouts.
The goal is to extract:
- Hospital details: name, address, contact
- Patient details: name, ID, admission/discharge dates, doctor
- Bill summary: bill number/date, subtotal, taxes, grand total
- Line items: service name, quantity, unit price, total price
Given the variation in formats and OCR challenges, I’m exploring approaches for:
- Robust OCR on noisy scans
- Table detection and structure extraction
- Key-value pair extraction
- End-to-end document understanding pipelines
I’m especially interested in Hugging Face models or pipelines that perform well on noisy medical billing documents.
Open to suggestions on models, architectures, or practical strategies that have worked in similar use cases.
Thank you!