d3p4rt commited on
Commit
f5d6d42
·
verified ·
1 Parent(s): 15f2a3d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -0
README.md CHANGED
@@ -72,6 +72,33 @@ Most of the 10% exact match comes from samples where the expected answer is the
72
  visible text on the document (e.g., the company title is the answer to "What is the
73
  company name?"). It is **not** evidence of question understanding.
74
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75
  ## Recommended use
76
 
77
  ```python
 
72
  visible text on the document (e.g., the company title is the answer to "What is the
73
  company name?"). It is **not** evidence of question understanding.
74
 
75
+ ## Use Cases
76
+
77
+ This model is best suited for tasks that do **not** require understanding a specific question about the document. Given its question-blind behavior, it works well as a document-aware OCR captioner in the following scenarios:
78
+
79
+ **Document indexing and search**
80
+ Extracting the dominant visible text from large archives of scanned documents (invoices, contracts, forms) to make them keyword-searchable without any question-answering step.
81
+
82
+ **Alt-text and thumbnail description generation**
83
+ Automatically generating descriptions of document images for accessibility purposes or content management system previews.
84
+
85
+ **Visual salience detection**
86
+ Identifying the most visually prominent text in a document (title, total amount, masthead). The model appears to have learned a form of salience awareness, which can be useful for extracting the "headline" information from structured documents.
87
+
88
+ **Hybrid OCR pipelines**
89
+ Using the model as a first stage to extract text regions, then passing those regions to a separate reasoning model downstream.
90
+
91
+ **Fine-tuning checkpoint**
92
+ Starting a domain-specific fine-tune from this checkpoint rather than from `microsoft/Florence-2-large` vanilla, particularly for document-heavy domains.
93
+
94
+
95
+ ## When Not to Use This Model
96
+
97
+ - **Document Question Answering (DocQA):** The model is question-blind and will ignore any natural language question you provide. Do not use it in any pipeline where the output must depend on what the user asks.
98
+ - **Conversational document assistants:** Chatbots, legal assistants, medical record reviewers, or any interactive system where a user expects answers grounded in a specific question.
99
+ - **Multi-document reasoning:** The model processes a single image and has no cross-document or contextual reasoning capability.
100
+ - **Production-critical extraction:** With 10% exact match on DocumentVQA, accuracy is not sufficient for any use case where extraction errors have significant consequences.
101
+
102
  ## Recommended use
103
 
104
  ```python