Update README.md
Browse files
README.md
CHANGED
|
@@ -22,6 +22,8 @@ tags:
|
|
| 22 |
|
| 23 |
**Release Blog available on [OpenTyphoon Blog](https://opentyphoon.ai/blog/en/typhoon-ocr-release)**
|
| 24 |
|
|
|
|
|
|
|
| 25 |
|
| 26 |
## **Real-World Document Support**
|
| 27 |
|
|
@@ -59,6 +61,7 @@ However, in the Thai books benchmark, performance slightly declined due to the h
|
|
| 59 |
For this version, our primary focus has been on achieving high-quality OCR for both English and Thai text. Future releases may extend support to more advanced image analysis and figure interpretation.
|
| 60 |
|
| 61 |
## Usage Example
|
|
|
|
| 62 |
**(Recommended): Full inference code available on [Colab](https://colab.research.google.com/drive/1z4Fm2BZnKcFIoWuyxzzIIIn8oI2GKl3r?usp=sharing)**
|
| 63 |
|
| 64 |
|
|
@@ -183,6 +186,29 @@ text_output = processor.tokenizer.batch_decode(
|
|
| 183 |
print(text_output[0])
|
| 184 |
```
|
| 185 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 186 |
## **Intended Uses & Limitations**
|
| 187 |
|
| 188 |
This is a task-specific model intended to be used only with the provided prompts. It does not include any guardrails or VQA capability. Due to the nature of large language models (LLMs), a certain level of hallucination may occur. We recommend that developers carefully assess these risks in the context of their specific use case.
|
|
|
|
| 22 |
|
| 23 |
**Release Blog available on [OpenTyphoon Blog](https://opentyphoon.ai/blog/en/typhoon-ocr-release)**
|
| 24 |
|
| 25 |
+
*Remark: This model is intended to be used with a specific prompt only; it will not work with any other prompts.
|
| 26 |
+
|
| 27 |
|
| 28 |
## **Real-World Document Support**
|
| 29 |
|
|
|
|
| 61 |
For this version, our primary focus has been on achieving high-quality OCR for both English and Thai text. Future releases may extend support to more advanced image analysis and figure interpretation.
|
| 62 |
|
| 63 |
## Usage Example
|
| 64 |
+
|
| 65 |
**(Recommended): Full inference code available on [Colab](https://colab.research.google.com/drive/1z4Fm2BZnKcFIoWuyxzzIIIn8oI2GKl3r?usp=sharing)**
|
| 66 |
|
| 67 |
|
|
|
|
| 186 |
print(text_output[0])
|
| 187 |
```
|
| 188 |
|
| 189 |
+
## Prompting
|
| 190 |
+
|
| 191 |
+
This model only works with the specific prompts defined below, where `{base_text}` refers to information extracted from the PDF metadata using the `get_anchor_text` function from the `typhoon-ocr` package. It will not function correctly with any other prompts.
|
| 192 |
+
|
| 193 |
+
```
|
| 194 |
+
PROMPTS_SYS = {
|
| 195 |
+
"default": lambda base_text: (f"Below is an image of a document page along with its dimensions. "
|
| 196 |
+
f"Simply return the markdown representation of this document, presenting tables in markdown format as they naturally appear.\n"
|
| 197 |
+
f"If the document contains images, use a placeholder like dummy.png for each image.\n"
|
| 198 |
+
f"Your final output must be in JSON format with a single key `natural_text` containing the response.\n"
|
| 199 |
+
f"RAW_TEXT_START\n{base_text}\nRAW_TEXT_END"),
|
| 200 |
+
"structure": lambda base_text: (
|
| 201 |
+
f"Below is an image of a document page, along with its dimensions and possibly some raw textual content previously extracted from it. "
|
| 202 |
+
f"Note that the text extraction may be incomplete or partially missing. Carefully consider both the layout and any available text to reconstruct the document accurately.\n"
|
| 203 |
+
f"Your task is to return the markdown representation of this document, presenting tables in HTML format as they naturally appear.\n"
|
| 204 |
+
f"If the document contains images or figures, analyze them and include the tag <figure>IMAGE_ANALYSIS</figure> in the appropriate location.\n"
|
| 205 |
+
f"Your final output must be in JSON format with a single key `natural_text` containing the response.\n"
|
| 206 |
+
f"RAW_TEXT_START\n{base_text}\nRAW_TEXT_END"
|
| 207 |
+
),
|
| 208 |
+
}
|
| 209 |
+
```
|
| 210 |
+
|
| 211 |
+
|
| 212 |
## **Intended Uses & Limitations**
|
| 213 |
|
| 214 |
This is a task-specific model intended to be used only with the provided prompts. It does not include any guardrails or VQA capability. Due to the nature of large language models (LLMs), a certain level of hallucination may occur. We recommend that developers carefully assess these risks in the context of their specific use case.
|