Chhagan005
/

CSM-DocExtract-VL-Q4KM

Image-Text-to-Text

document-extraction

multilingual-ocr

vision-language-model

4-bit precision

Model card Files Files and versions

Chhagan005 commited on 3 days ago

Commit

fe54b52

·

verified ·

1 Parent(s): 086cfde

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md +3 -29

README.md CHANGED Viewed

@@ -72,35 +72,9 @@ This is the **4-bit NF4 quantized version** of our fine-tuned 8-Billion paramete
 Below is the workflow of how the model processes a document image, attends to specific fields, and resolves conflicts (e.g., MRZ vs. Printed Text):
-```mermaid
-graph TD
-    subgraph Input Layer
-        A[Document Image] -->|Resize/Normalize| C(Vision Encoder - ViT)
-        B[System Prompt + User Query] -->|Tokenize| D(Text Tokenizer)
-    end
-    subgraph Core VLM Processing (8B INT4)
-        C --> E{Multimodal Fusion}
-        D --> E
-        E --> F[Transformer Blocks]
-        subgraph LoRA Adapters
-            F -.->|Target: q_proj, v_proj, o_proj| G[Trained LoRA Weights]
-        end
-    end
-    subgraph Output & Resolution Layer
-        G --> H{Conflict Resolution Logic}
-        H -->|MRZ > Printed Latin > Transliterated| I[JSON Generation]
-    end
-    I --> J[Structured KYC JSON]
-    style A fill:#e1f5fe,stroke:#01579b
-    style B fill:#e1f5fe,stroke:#01579b
-    style G fill:#fff9c4,stroke:#fbc02d,stroke-width:2px
-    style J fill:#e8f5e9,stroke:#2e7d32
-```
 ### 📊 Performance Comparison: FP16 vs INT4

 Below is the workflow of how the model processes a document image, attends to specific fields, and resolves conflicts (e.g., MRZ vs. Printed Text):
+![Architecture LLD](https://huggingface.co/Chhagan005/CSM-DocExtract-VL-Q4KM/resolve/main/architecture.png)
+*(High-resolution architecture flow for KYC document processing)*
 ### 📊 Performance Comparison: FP16 vs INT4