d360_vlm_unified / README.md
D360-VLM V2
Release v4: unified D360-VLM model card, runtime wrappers, and artifacts
7e8caed
metadata
license: apache-2.0
tags:
  - document-intelligence
  - vision-language
  - ocr
  - information-extraction
  - multimodal
pipeline_tag: image-text-to-text
base_model: HuggingFaceTB/SmolVLM-500M-Instruct

D360-VLM v4.0.0

D360-VLM v4.0.0 is the V4 unified enterprise document intelligence model for extraction, classification, and structured JSON generation from business documents, including noisy and partially degraded scans.

What's New In V4

  • Migrates to one canonical d360_vlm runtime with legacy V3 API/inference wrappers for backward compatibility.
  • Adds release gates for field extraction F1, document classification accuracy, confidence calibration, and JSON schema validity.
  • Introduces a normalized business document schema (business-doc-v1) and a LLaVA-style image+instruction+JSON supervision manifest.
  • Ships budget-guarded Modal training + packaging workflow for repeatable enterprise releases.

Models Used

  • Primary V4 base model: HuggingFaceTB/SmolVLM-500M-Instruct
  • Training objective: Vision-to-JSON instruction tuning (image + prompt -> structured response)
  • Runtime surface: Unified d360_vlm engine + V3 compatibility endpoints

V4 Architecture

  • Backbone: AutoModelForVision2Seq initialized from HuggingFaceTB/SmolVLM-500M-Instruct
  • Processor: AutoProcessor multimodal tokenization for image + text sequences
  • Supervision format: LLaVA-style examples (<image>, business prompt, target JSON response)
  • Serving layer: Canonical FastAPI app in d360_vlm.api.app with V3-compatible routes and wrappers
  • Quality gates: Extraction/classification metrics, confidence buckets (ECE), schema checks, and release readiness flags

Training Summary (Current V4 Run)

  • Run name: d360_vlm_v4
  • Train rows: unknown
  • Eval rows: unknown
  • Elapsed minutes: unknown
  • Projected spend (USD): unknown (budget: unknown)
  • Hardware profile: Modal L4 GPU with runtime budget guard

Evaluation Signals

  • Release ready: None
  • Field F1: None
  • Document type accuracy: None
  • Schema valid rate: None

Included In This HF Repo

  • model/ trained model artifacts
  • model.py runtime wrapper
  • d360_vlm_v3_inference.py compatibility inference API
  • d360_vlm_v3_api.py compatibility FastAPI entrypoint
  • eval_gate_results.json gate report
  • model_card.json structured release metadata

Intended Use

  • Enterprise document workflows: forms, invoices, receipts, IDs, and mixed-layout business documents.
  • Structured extraction and classification into downstream JSON contracts.

Limitations

  • Performance is dataset-dependent; always validate on your own documents.
  • Degraded handwriting and extreme low-resolution scans may reduce quality.
  • Human review is recommended for high-risk compliance and financial workflows.

Repository

abrarali113/d360_vlm_unified