Chhagan005 commited on
Commit
fe54b52
·
verified ·
1 Parent(s): 086cfde

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +3 -29
README.md CHANGED
@@ -72,35 +72,9 @@ This is the **4-bit NF4 quantized version** of our fine-tuned 8-Billion paramete
72
 
73
  Below is the workflow of how the model processes a document image, attends to specific fields, and resolves conflicts (e.g., MRZ vs. Printed Text):
74
 
75
- ```mermaid
76
- graph TD
77
- subgraph Input Layer
78
- A[Document Image] -->|Resize/Normalize| C(Vision Encoder - ViT)
79
- B[System Prompt + User Query] -->|Tokenize| D(Text Tokenizer)
80
- end
81
-
82
- subgraph Core VLM Processing (8B INT4)
83
- C --> E{Multimodal Fusion}
84
- D --> E
85
- E --> F[Transformer Blocks]
86
-
87
- subgraph LoRA Adapters
88
- F -.->|Target: q_proj, v_proj, o_proj| G[Trained LoRA Weights]
89
- end
90
- end
91
-
92
- subgraph Output & Resolution Layer
93
- G --> H{Conflict Resolution Logic}
94
- H -->|MRZ > Printed Latin > Transliterated| I[JSON Generation]
95
- end
96
-
97
- I --> J[Structured KYC JSON]
98
-
99
- style A fill:#e1f5fe,stroke:#01579b
100
- style B fill:#e1f5fe,stroke:#01579b
101
- style G fill:#fff9c4,stroke:#fbc02d,stroke-width:2px
102
- style J fill:#e8f5e9,stroke:#2e7d32
103
- ```
104
 
105
  ### 📊 Performance Comparison: FP16 vs INT4
106
 
 
72
 
73
  Below is the workflow of how the model processes a document image, attends to specific fields, and resolves conflicts (e.g., MRZ vs. Printed Text):
74
 
75
+ ![Architecture LLD](https://huggingface.co/Chhagan005/CSM-DocExtract-VL-Q4KM/resolve/main/architecture.png)
76
+
77
+ *(High-resolution architecture flow for KYC document processing)*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
78
 
79
  ### 📊 Performance Comparison: FP16 vs INT4
80