D360-VLM V2

Release v4: unified D360-VLM model card, runtime wrappers, and artifacts

7e8caed about 2 months ago

2.97 kB

license: apache-2.0
tags:
  - document-intelligence
  - vision-language
  - ocr
  - information-extraction
  - multimodal
pipeline_tag: image-text-to-text
base_model: HuggingFaceTB/SmolVLM-500M-Instruct

D360-VLM v4.0.0

D360-VLM v4.0.0 is the V4 unified enterprise document intelligence model for extraction, classification, and structured JSON generation from business documents, including noisy and partially degraded scans.

What's New In V4

Migrates to one canonical d360_vlm runtime with legacy V3 API/inference wrappers for backward compatibility.
Adds release gates for field extraction F1, document classification accuracy, confidence calibration, and JSON schema validity.
Introduces a normalized business document schema (business-doc-v1) and a LLaVA-style image+instruction+JSON supervision manifest.
Ships budget-guarded Modal training + packaging workflow for repeatable enterprise releases.

Models Used

Primary V4 base model: HuggingFaceTB/SmolVLM-500M-Instruct
Training objective: Vision-to-JSON instruction tuning (image + prompt -> structured response)
Runtime surface: Unified d360_vlm engine + V3 compatibility endpoints

V4 Architecture

Backbone: AutoModelForVision2Seq initialized from HuggingFaceTB/SmolVLM-500M-Instruct
Processor: AutoProcessor multimodal tokenization for image + text sequences
Supervision format: LLaVA-style examples (<image>, business prompt, target JSON response)
Serving layer: Canonical FastAPI app in d360_vlm.api.app with V3-compatible routes and wrappers
Quality gates: Extraction/classification metrics, confidence buckets (ECE), schema checks, and release readiness flags

Training Summary (Current V4 Run)

Run name: d360_vlm_v4
Train rows: unknown
Eval rows: unknown
Elapsed minutes: unknown
Projected spend (USD): unknown (budget: unknown)
Hardware profile: Modal L4 GPU with runtime budget guard

Evaluation Signals

Release ready: None
Field F1: None
Document type accuracy: None
Schema valid rate: None

Included In This HF Repo

model/ trained model artifacts
model.py runtime wrapper
d360_vlm_v3_inference.py compatibility inference API
d360_vlm_v3_api.py compatibility FastAPI entrypoint
eval_gate_results.json gate report
model_card.json structured release metadata

Intended Use

Enterprise document workflows: forms, invoices, receipts, IDs, and mixed-layout business documents.
Structured extraction and classification into downstream JSON contracts.

Limitations

Performance is dataset-dependent; always validate on your own documents.
Degraded handwriting and extreme low-resolution scans may reduce quality.
Human review is recommended for high-risk compliance and financial workflows.

Repository

abrarali113/d360_vlm_unified