typhoon-ai
/

typhoon-ocr-7b

@@ -22,6 +22,8 @@ tags:
 **Release Blog available on [OpenTyphoon Blog](https://opentyphoon.ai/blog/en/typhoon-ocr-release)**
 ## **Real-World Document Support**
@@ -59,6 +61,7 @@ However, in the Thai books benchmark, performance slightly declined due to the h
 For this version, our primary focus has been on achieving high-quality OCR for both English and Thai text. Future releases may extend support to more advanced image analysis and figure interpretation.
 ## Usage Example
 **(Recommended): Full inference code available on [Colab](https://colab.research.google.com/drive/1z4Fm2BZnKcFIoWuyxzzIIIn8oI2GKl3r?usp=sharing)**
@@ -183,6 +186,29 @@ text_output = processor.tokenizer.batch_decode(
 print(text_output[0])
 ```
 ## **Intended Uses & Limitations**
 This is a task-specific model intended to be used only with the provided prompts. It does not include any guardrails or VQA capability. Due to the nature of large language models (LLMs), a certain level of hallucination may occur. We recommend that developers carefully assess these risks in the context of their specific use case.

 **Release Blog available on [OpenTyphoon Blog](https://opentyphoon.ai/blog/en/typhoon-ocr-release)**
+*Remark: This model is intended to be used with a specific prompt only; it will not work with any other prompts.
 ## **Real-World Document Support**
 For this version, our primary focus has been on achieving high-quality OCR for both English and Thai text. Future releases may extend support to more advanced image analysis and figure interpretation.
 ## Usage Example
 **(Recommended): Full inference code available on [Colab](https://colab.research.google.com/drive/1z4Fm2BZnKcFIoWuyxzzIIIn8oI2GKl3r?usp=sharing)**
 print(text_output[0])
 ```
+## Prompting
+This model only works with the specific prompts defined below, where `{base_text}` refers to information extracted from the PDF metadata using the `get_anchor_text` function from the `typhoon-ocr` package. It will not function correctly with any other prompts.
+```
+PROMPTS_SYS = {
+    "default": lambda base_text: (f"Below is an image of a document page along with its dimensions. "
+        f"Simply return the markdown representation of this document, presenting tables in markdown format as they naturally appear.\n"
+        f"If the document contains images, use a placeholder like dummy.png for each image.\n"
+        f"Your final output must be in JSON format with a single key `natural_text` containing the response.\n"
+        f"RAW_TEXT_START\n{base_text}\nRAW_TEXT_END"),
+    "structure": lambda base_text: (
+        f"Below is an image of a document page, along with its dimensions and possibly some raw textual content previously extracted from it. "
+        f"Note that the text extraction may be incomplete or partially missing. Carefully consider both the layout and any available text to reconstruct the document accurately.\n"
+        f"Your task is to return the markdown representation of this document, presenting tables in HTML format as they naturally appear.\n"
+        f"If the document contains images or figures, analyze them and include the tag <figure>IMAGE_ANALYSIS</figure> in the appropriate location.\n"
+        f"Your final output must be in JSON format with a single key `natural_text` containing the response.\n"
+        f"RAW_TEXT_START\n{base_text}\nRAW_TEXT_END"
+    ),
+}
+```
 ## **Intended Uses & Limitations**
 This is a task-specific model intended to be used only with the provided prompts. It does not include any guardrails or VQA capability. Due to the nature of large language models (LLMs), a certain level of hallucination may occur. We recommend that developers carefully assess these risks in the context of their specific use case.