Update README.md
Browse files
README.md
CHANGED
|
@@ -78,6 +78,22 @@ from typhoon_ocr import ocr_document
|
|
| 78 |
markdown = ocr_document("test.png")
|
| 79 |
print(markdown)
|
| 80 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 81 |
**Run Manually**
|
| 82 |
|
| 83 |
Below is a partial snippet. You can run inference using either the API or a local model.
|
|
@@ -149,7 +165,8 @@ response = openai.chat.completions.create(
|
|
| 149 |
text_output = response.choices[0].message.content
|
| 150 |
print(text_output)
|
| 151 |
```
|
| 152 |
-
|
|
|
|
| 153 |
```python
|
| 154 |
# Initialize the model
|
| 155 |
model = Qwen2_5_VLForConditionalGeneration.from_pretrained("scb10x/typhoon-ocr-7b", torch_dtype=torch.bfloat16 ).eval()
|
|
@@ -191,7 +208,7 @@ print(text_output[0])
|
|
| 191 |
|
| 192 |
This model only works with the specific prompts defined below, where `{base_text}` refers to information extracted from the PDF metadata using the `get_anchor_text` function from the `typhoon-ocr` package. It will not function correctly with any other prompts.
|
| 193 |
|
| 194 |
-
```
|
| 195 |
PROMPTS_SYS = {
|
| 196 |
"default": lambda base_text: (f"Below is an image of a document page along with its dimensions. "
|
| 197 |
f"Simply return the markdown representation of this document, presenting tables in markdown format as they naturally appear.\n"
|
|
@@ -212,16 +229,23 @@ PROMPTS_SYS = {
|
|
| 212 |
### Generation Parameters
|
| 213 |
|
| 214 |
We suggest using the following generation parameters. Since this is an OCR model, we do not recommend using a high temperature. Make sure the temperature is set to 0 or 0.1, not higher.
|
| 215 |
-
```
|
| 216 |
temperature=0.1,
|
| 217 |
top_p=0.6,
|
| 218 |
repetition_penalty: 1.2
|
| 219 |
```
|
| 220 |
|
| 221 |
## Hosting
|
| 222 |
-
|
|
|
|
|
|
|
| 223 |
vllm serve scb10x/typhoon-ocr-7b --max-model-len 32000 # OpenAI Compatible at http://localhost:8000
|
| 224 |
-
# then you can supply base_url in to ocr_document
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 225 |
```
|
| 226 |
|
| 227 |
## **Intended Uses & Limitations**
|
|
|
|
| 78 |
markdown = ocr_document("test.png")
|
| 79 |
print(markdown)
|
| 80 |
```
|
| 81 |
+
|
| 82 |
+
**(Recommended): Local Model via vllm (GPU Required)**:
|
| 83 |
+
|
| 84 |
+
```bash
|
| 85 |
+
pip install vllm
|
| 86 |
+
vllm serve scb10x/typhoon-ocr-7b --max-model-len 32000 --served-model-name typhoon-ocr-preview # OpenAI Compatible at http://localhost:8000 (or other port)
|
| 87 |
+
# then you can supply base_url in to ocr_document
|
| 88 |
+
```
|
| 89 |
+
|
| 90 |
+
```python
|
| 91 |
+
from typhoon_ocr import ocr_document
|
| 92 |
+
markdown = ocr_document('image.png', base_url='http://localhost:8000/v1', api_key='anything-is-ok')
|
| 93 |
+
print(markdown)
|
| 94 |
+
```
|
| 95 |
+
To read more about [vllm](https://docs.vllm.ai/en/latest/getting_started/quickstart.html)
|
| 96 |
+
|
| 97 |
**Run Manually**
|
| 98 |
|
| 99 |
Below is a partial snippet. You can run inference using either the API or a local model.
|
|
|
|
| 165 |
text_output = response.choices[0].message.content
|
| 166 |
print(text_output)
|
| 167 |
```
|
| 168 |
+
|
| 169 |
+
*(Not Recommended): Local Model - Transformers (GPU Required)*:
|
| 170 |
```python
|
| 171 |
# Initialize the model
|
| 172 |
model = Qwen2_5_VLForConditionalGeneration.from_pretrained("scb10x/typhoon-ocr-7b", torch_dtype=torch.bfloat16 ).eval()
|
|
|
|
| 208 |
|
| 209 |
This model only works with the specific prompts defined below, where `{base_text}` refers to information extracted from the PDF metadata using the `get_anchor_text` function from the `typhoon-ocr` package. It will not function correctly with any other prompts.
|
| 210 |
|
| 211 |
+
```python
|
| 212 |
PROMPTS_SYS = {
|
| 213 |
"default": lambda base_text: (f"Below is an image of a document page along with its dimensions. "
|
| 214 |
f"Simply return the markdown representation of this document, presenting tables in markdown format as they naturally appear.\n"
|
|
|
|
| 229 |
### Generation Parameters
|
| 230 |
|
| 231 |
We suggest using the following generation parameters. Since this is an OCR model, we do not recommend using a high temperature. Make sure the temperature is set to 0 or 0.1, not higher.
|
| 232 |
+
```python
|
| 233 |
temperature=0.1,
|
| 234 |
top_p=0.6,
|
| 235 |
repetition_penalty: 1.2
|
| 236 |
```
|
| 237 |
|
| 238 |
## Hosting
|
| 239 |
+
|
| 240 |
+
We recommend to inference typhoon-ocr using [vllm](https://github.com/vllm-project/vllm) instead of huggingface transformers, and using typhoon-ocr library to ocr documents. To read more about [vllm](https://docs.vllm.ai/en/latest/getting_started/quickstart.html)
|
| 241 |
+
```bash
|
| 242 |
vllm serve scb10x/typhoon-ocr-7b --max-model-len 32000 # OpenAI Compatible at http://localhost:8000
|
| 243 |
+
# then you can supply base_url in to ocr_document
|
| 244 |
+
```
|
| 245 |
+
|
| 246 |
+
```python
|
| 247 |
+
from typhoon_ocr import ocr_document
|
| 248 |
+
ocr_document('image.jpg', base_url='http://localhost:8000/v1')
|
| 249 |
```
|
| 250 |
|
| 251 |
## **Intended Uses & Limitations**
|