kunato commited on
Commit
5e40569
·
verified ·
1 Parent(s): b8b808b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -0
README.md CHANGED
@@ -22,6 +22,8 @@ tags:
22
 
23
  **Release Blog available on [OpenTyphoon Blog](https://opentyphoon.ai/blog/en/typhoon-ocr-release)**
24
 
 
 
25
 
26
  ## **Real-World Document Support**
27
 
@@ -59,6 +61,7 @@ However, in the Thai books benchmark, performance slightly declined due to the h
59
  For this version, our primary focus has been on achieving high-quality OCR for both English and Thai text. Future releases may extend support to more advanced image analysis and figure interpretation.
60
 
61
  ## Usage Example
 
62
  **(Recommended): Full inference code available on [Colab](https://colab.research.google.com/drive/1z4Fm2BZnKcFIoWuyxzzIIIn8oI2GKl3r?usp=sharing)**
63
 
64
 
@@ -183,6 +186,29 @@ text_output = processor.tokenizer.batch_decode(
183
  print(text_output[0])
184
  ```
185
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
186
  ## **Intended Uses & Limitations**
187
 
188
  This is a task-specific model intended to be used only with the provided prompts. It does not include any guardrails or VQA capability. Due to the nature of large language models (LLMs), a certain level of hallucination may occur. We recommend that developers carefully assess these risks in the context of their specific use case.
 
22
 
23
  **Release Blog available on [OpenTyphoon Blog](https://opentyphoon.ai/blog/en/typhoon-ocr-release)**
24
 
25
+ *Remark: This model is intended to be used with a specific prompt only; it will not work with any other prompts.
26
+
27
 
28
  ## **Real-World Document Support**
29
 
 
61
  For this version, our primary focus has been on achieving high-quality OCR for both English and Thai text. Future releases may extend support to more advanced image analysis and figure interpretation.
62
 
63
  ## Usage Example
64
+
65
  **(Recommended): Full inference code available on [Colab](https://colab.research.google.com/drive/1z4Fm2BZnKcFIoWuyxzzIIIn8oI2GKl3r?usp=sharing)**
66
 
67
 
 
186
  print(text_output[0])
187
  ```
188
 
189
+ ## Prompting
190
+
191
+ This model only works with the specific prompts defined below, where `{base_text}` refers to information extracted from the PDF metadata using the `get_anchor_text` function from the `typhoon-ocr` package. It will not function correctly with any other prompts.
192
+
193
+ ```
194
+ PROMPTS_SYS = {
195
+ "default": lambda base_text: (f"Below is an image of a document page along with its dimensions. "
196
+ f"Simply return the markdown representation of this document, presenting tables in markdown format as they naturally appear.\n"
197
+ f"If the document contains images, use a placeholder like dummy.png for each image.\n"
198
+ f"Your final output must be in JSON format with a single key `natural_text` containing the response.\n"
199
+ f"RAW_TEXT_START\n{base_text}\nRAW_TEXT_END"),
200
+ "structure": lambda base_text: (
201
+ f"Below is an image of a document page, along with its dimensions and possibly some raw textual content previously extracted from it. "
202
+ f"Note that the text extraction may be incomplete or partially missing. Carefully consider both the layout and any available text to reconstruct the document accurately.\n"
203
+ f"Your task is to return the markdown representation of this document, presenting tables in HTML format as they naturally appear.\n"
204
+ f"If the document contains images or figures, analyze them and include the tag <figure>IMAGE_ANALYSIS</figure> in the appropriate location.\n"
205
+ f"Your final output must be in JSON format with a single key `natural_text` containing the response.\n"
206
+ f"RAW_TEXT_START\n{base_text}\nRAW_TEXT_END"
207
+ ),
208
+ }
209
+ ```
210
+
211
+
212
  ## **Intended Uses & Limitations**
213
 
214
  This is a task-specific model intended to be used only with the provided prompts. It does not include any guardrails or VQA capability. Due to the nature of large language models (LLMs), a certain level of hallucination may occur. We recommend that developers carefully assess these risks in the context of their specific use case.