No text output inferencing with ollama?

#8
by frankslin - opened

I'm using the example image from https://cdn.bigmodel.cn/static/logo/introduction.png.

Apple M2, macOS 15.7.3.

$ ollama -v
ollama version is 0.15.5-rc1
$ sha256sum introduction.png
b01139689cb0682b3a1d6f3f3eda3f481101571654e3a23a1de5bf5520f3c6f5  introduction.png
$ ollama run glm-ocr Text Recognition: ./introduction.png
Added image './introduction.png'
```markdown

```markdown

```markdown

```text

When I tried with another file, I got this error:

Error: an error was encountered while running the model: GGML_ASSERT(a->ne[2] * 4 == b->ne[0]) failed
WARNING: Using native backtrace. Set GGML_BACKTRACE_LLDB for more info.
WARNING: GGML_BACKTRACE_LLDB may cause native MacOS Terminal.app to crash.
See: https://github.com/ggml-org/llama.cpp/pull/17869
0 ollama 0x0000000100e6d620 ggml_print_backtrace + 276


What am I missing?

我也遇见了同样的问题,我希望这个研究人员要端正态度,好好的把东西做好。

I am having the same problem serving glm-ocr from ollama.

  • Macbook Air M2, Tahoe 26.2
  • Ollama pre-release 0.15.5

Output for first three images of a non-complex PDF (https://www.fiw.uni-bonn.de/de/forschung/demokratieforschung/team/prof-dr-rudolf-stichweh/papers/pdfs/81_stw_niklas-luhmann-blackwell-companion-to-major-social-theorists.pdf):

Page 1

"<table bordered, no table, but no table, but no table, but no table, no table, no table, no layout, but no layout, or any other content is present. No layout, or any other content is present. No text is present. No layout. No layout. No layout. No layout. No layout."


Page 2

"<table bordered, no table, but no table, but no table, but no table, no table, no layout, rows, columns, rows, columns, rows, or any other content within the image content is present. No text, or any other content is present. No text is present. No layout, just a single row, just a single row, just a single row, just a single row, just a single row, no layout. No layout. No layout. No layout. No layout. No layout. No layout. No layout."


Page 3

Error extracting page: an error was encountered while running the model: GGML_ASSERT(a->ne[2] * 4 == b->ne[0]) failed
WARNING: Using native backtrace. Set GGML_BACKTRACE_LLDB for more info.
WARNING: GGML_BACKTRACE_LLDB may cause native MacOS Terminal.app to crash.
See: https://github.com/ggml-org/llama.cpp/pull/17869
0 ollama 0x0000000101b050c0 ggml_print_backtrace + 276
1 ollama 0x0000000101b052ac ggml_abort + 156
2 ollama 0x0000000101b0d8e8 ggml_rope + 300
3 ollama 0x0000000101b0db44 ggml_rope_multi + 20
4 ollama 0x0000000101aaf2b0 _cgo_7ebcd35a9797_Cfunc_ggml_rope_multi + 64
5 ollama 0x0000000100e6549c ollama + 513180 (status code: 500)

Either increase the context for the model or reduce the image size. Depending on your available VRAM, Ollama defaults to only 4096 as context, that is not enough for that large file (introduction.png). You can check the available context after loading the model with ollama ps.

I can confirm that increasing the context size works. I am using a context size of 10240 and have successfully extracted text from images with a size of 12 megapixels. Here is the modelfile I am using:

# Modelfile generated by "ollama show"
FROM glm-ocr:latest

# Increase context size
PARAMETER num_ctx 10240

TEMPLATE {{ .Prompt }}
RENDERER glm-ocr
PARSER glm-ocr
PARAMETER temperature 0

yea ur right mate

Thank you to all for providing the "increase context window" solution .
I did same and I have moved from gibberish output to the clean exact text expected

This comment has been hidden

It's working for me, but I'd like to have the text as Markdown. Is it possible with only this model ?

@marcofal This is the community for the original model, not the OIlama version, so probably questions here should be related to the original model.

Anyway, as far as I know, the model uses Markdown tags automatically if there are multiple sections or code, otherwise MD == plaintext. If you need further help I would suggest joining their Discord server (link on model's main page), or an Ollama related community (usually GitHub issues is not a good place for general questions that are not related to coding).

Regards

Hi NeoHuggingF,
I tried it with Ollama and vllm and the output it is the same. I wonder why doesn't mark headers, for example, that have a bigger font.
Anyway, it is a very good model.

@marcofal
You may want to take a look into these:
https://github.com/zai-org/GLM-OCR?tab=readme-ov-file#configuration
https://github.com/zai-org/GLM-OCR/blob/main/examples/ollama-deploy/README.md

Disclaimer: I am not related to GLM-OCR, just a regular user.

Sign up or log in to comment