Vik Paruchuri commited on
Commit
5471d0c
·
1 Parent(s): 68eee70

Factor out llm services, enable local models

Browse files
README.md CHANGED
@@ -22,7 +22,7 @@ See [below](#benchmarks) for detailed speed and accuracy benchmarks, and instruc
22
 
23
  ## Hybrid Mode
24
 
25
- For the highest accuracy, pass the `--use_llm` flag to use an LLM alongside marker. This will do things like merge tables across pages, format tables properly, and extract values from forms. It uses `gemini-flash-2.0`, which is cheap and fast.
26
 
27
  Here is a table benchmark comparing marker, gemini flash alone, and marker with use_llm:
28
 
@@ -42,7 +42,7 @@ As you can see, the use_llm mode offers higher accuracy than marker or gemini al
42
 
43
  I want marker to be as widely accessible as possible, while still funding my development/training costs. Research and personal usage is always okay, but there are some restrictions on commercial usage.
44
 
45
- The weights for the models are licensed `cc-by-nc-sa-4.0`, but I will waive that for any organization under $5M USD in gross revenue in the most recent 12-month period AND under $5M in lifetime VC/angel funding raised. You also must not be competitive with the [Datalab API](https://www.datalab.to/). If you want to remove the GPL license requirements (dual-license) and/or use the weights commercially over the revenue limit, check out the options [here](https://www.datalab.to).
46
 
47
  # Hosted API
48
 
@@ -105,6 +105,8 @@ Options:
105
  - `--languages TEXT`: Optionally specify which languages to use for OCR processing. Accepts a comma-separated list. Example: `--languages "en,fr,de"` for English, French, and German.
106
  - `config --help`: List all available builders, processors, and converters, and their associated configuration. These values can be used to build a JSON configuration file for additional tweaking of marker defaults.
107
  - `--converter_cls`: One of `marker.converters.pdf.PdfConverter` (default) or `marker.converters.table.TableConverter`. The `PdfConverter` will convert the whole PDF, the `TableConverter` will only extract and convert tables.
 
 
108
 
109
  The list of supported languages for surya OCR is [here](https://github.com/VikParuchuri/surya/blob/master/surya/recognition/languages.py). If you don't need OCR, marker can work with any language.
110
 
@@ -146,7 +148,7 @@ text, _, images = text_from_rendered(rendered)
146
 
147
  ### Custom configuration
148
 
149
- You can pass configuration using the `ConfigParser`:
150
 
151
  ```python
152
  from marker.converters.pdf import PdfConverter
@@ -310,6 +312,14 @@ All output formats will return a metadata dictionary, with the following fields:
310
  }
311
  ```
312
 
 
 
 
 
 
 
 
 
313
  # Internals
314
 
315
  Marker is easy to extend. The core units of marker are:
 
22
 
23
  ## Hybrid Mode
24
 
25
+ For the highest accuracy, pass the `--use_llm` flag to use an LLM alongside marker. This will do things like merge tables across pages, handle inline math, format tables properly, and extract values from forms. It can use any Google model (`gemini-2.0-flash` by default), or any ollama model. See [below](#llm-services) for details.
26
 
27
  Here is a table benchmark comparing marker, gemini flash alone, and marker with use_llm:
28
 
 
42
 
43
  I want marker to be as widely accessible as possible, while still funding my development/training costs. Research and personal usage is always okay, but there are some restrictions on commercial usage.
44
 
45
+ The weights for the models are licensed `cc-by-nc-sa-4.0`, but I will waive that for any organization under \$5M USD in gross revenue in the most recent 12-month period AND under $5M in lifetime VC/angel funding raised. You also must not be competitive with the [Datalab API](https://www.datalab.to/). If you want to remove the GPL license requirements (dual-license) and/or use the weights commercially over the revenue limit, check out the options [here](https://www.datalab.to).
46
 
47
  # Hosted API
48
 
 
105
  - `--languages TEXT`: Optionally specify which languages to use for OCR processing. Accepts a comma-separated list. Example: `--languages "en,fr,de"` for English, French, and German.
106
  - `config --help`: List all available builders, processors, and converters, and their associated configuration. These values can be used to build a JSON configuration file for additional tweaking of marker defaults.
107
  - `--converter_cls`: One of `marker.converters.pdf.PdfConverter` (default) or `marker.converters.table.TableConverter`. The `PdfConverter` will convert the whole PDF, the `TableConverter` will only extract and convert tables.
108
+ - `--llm_service`: Which llm service to use if `--use_llm` is passed. This defaults to `marker.services.gemini.GoogleGeminiService`.
109
+ - `--help`: see all of the flags that can be passed into marker. (it supports many more options then are listed above)
110
 
111
  The list of supported languages for surya OCR is [here](https://github.com/VikParuchuri/surya/blob/master/surya/recognition/languages.py). If you don't need OCR, marker can work with any language.
112
 
 
148
 
149
  ### Custom configuration
150
 
151
+ You can pass configuration using the `ConfigParser`. To see all available options, do `marker_single --help`.
152
 
153
  ```python
154
  from marker.converters.pdf import PdfConverter
 
312
  }
313
  ```
314
 
315
+ # LLM Services
316
+
317
+ When running with the `--use_llm` flag, you have a choice of services you can use:
318
+
319
+ - `Gemini` - this will use the Gemini developer API by default. You'll need to pass `--gemini_api_key` to configuration.
320
+ - `Google Vertex` - this will use vertex, which can be more reliable. You'll need to pass `--vertex_project_id` and `--vertex_location`. To use it, set `--llm_service=marker.services.vertex.GoogleVertexService`.
321
+ - `Ollama` - this will use local models. You can configure `--ollama_base_url` and `--ollama_model`. To use it, set `--llm_service=marker.services.vertex.OllamaService`.
322
+
323
  # Internals
324
 
325
  Marker is easy to extend. The core units of marker are:
marker/builders/llm_layout.py CHANGED
@@ -1,12 +1,12 @@
1
  from concurrent.futures import ThreadPoolExecutor, as_completed
2
- from typing import Annotated
3
 
4
  from surya.layout import LayoutPredictor
5
  from tqdm import tqdm
6
  from pydantic import BaseModel
7
 
8
  from marker.builders.layout import LayoutBuilder
9
- from marker.services.google import GoogleModel
10
  from marker.providers.pdf import PdfProvider
11
  from marker.schema import BlockTypes
12
  from marker.schema.blocks import Block
@@ -97,10 +97,10 @@ Potential labels:
97
  Respond only with one of `Figure`, `Picture`, `ComplexRegion`, `Table`, or `Form`.
98
  """
99
 
100
- def __init__(self, layout_model: LayoutPredictor, config=None):
101
  super().__init__(layout_model, config)
102
 
103
- self.model = GoogleModel(self.google_api_key, self.model_name)
104
 
105
  def __call__(self, document: Document, provider: PdfProvider):
106
  super().__call__(document, provider)
@@ -158,7 +158,7 @@ Respond only with one of `Figure`, `Picture`, `ComplexRegion`, `Table`, or `Form
158
  def process_block_relabeling(self, document: Document, page: PageGroup, block: Block, prompt: str):
159
  image = self.extract_image(document, block)
160
 
161
- response = self.model.generate_response(
162
  prompt,
163
  image,
164
  block,
 
1
  from concurrent.futures import ThreadPoolExecutor, as_completed
2
+ from typing import Annotated, Type
3
 
4
  from surya.layout import LayoutPredictor
5
  from tqdm import tqdm
6
  from pydantic import BaseModel
7
 
8
  from marker.builders.layout import LayoutBuilder
9
+ from marker.services import BaseService
10
  from marker.providers.pdf import PdfProvider
11
  from marker.schema import BlockTypes
12
  from marker.schema.blocks import Block
 
97
  Respond only with one of `Figure`, `Picture`, `ComplexRegion`, `Table`, or `Form`.
98
  """
99
 
100
+ def __init__(self, layout_model: LayoutPredictor, llm_service: BaseService, config=None):
101
  super().__init__(layout_model, config)
102
 
103
+ self.llm_service = llm_service
104
 
105
  def __call__(self, document: Document, provider: PdfProvider):
106
  super().__call__(document, provider)
 
158
  def process_block_relabeling(self, document: Document, page: PageGroup, block: Block, prompt: str):
159
  image = self.extract_image(document, block)
160
 
161
+ response = self.llm_service(
162
  prompt,
163
  image,
164
  block,
marker/config/crawler.py CHANGED
@@ -9,10 +9,11 @@ from marker.converters import BaseConverter
9
  from marker.processors import BaseProcessor
10
  from marker.providers import BaseProvider
11
  from marker.renderers import BaseRenderer
 
12
 
13
 
14
  class ConfigCrawler:
15
- def __init__(self, base_classes=(BaseBuilder, BaseProcessor, BaseConverter, BaseProvider, BaseRenderer)):
16
  self.base_classes = base_classes
17
  self.class_config_map = {}
18
 
 
9
  from marker.processors import BaseProcessor
10
  from marker.providers import BaseProvider
11
  from marker.renderers import BaseRenderer
12
+ from marker.services import BaseService
13
 
14
 
15
  class ConfigCrawler:
16
+ def __init__(self, base_classes=(BaseBuilder, BaseProcessor, BaseConverter, BaseProvider, BaseRenderer, BaseService)):
17
  self.base_classes = base_classes
18
  self.class_config_map = {}
19
 
marker/config/parser.py CHANGED
@@ -39,9 +39,9 @@ class ConfigParser:
39
  fn = click.option("--languages", type=str, default=None, help="Comma separated list of languages to use for OCR.")(fn)
40
 
41
  # we put common options here
42
- fn = click.option("--google_api_key", type=str, default=None, help="Google API key for using LLMs.")(fn)
43
  fn = click.option("--use_llm", is_flag=True, default=False, help="Enable higher quality processing with LLMs.")(fn)
44
  fn = click.option("--converter_cls", type=str, default=None, help="Converter class to use. Defaults to PDF converter.")(fn)
 
45
 
46
  # enum options
47
  fn = click.option("--force_layout_block", type=click.Choice(choices=[t.name for t in BlockTypes]), default=None,)(fn)
@@ -74,8 +74,23 @@ class ConfigParser:
74
  case _:
75
  if k in crawler.attr_set:
76
  config[k] = v
 
 
 
 
 
77
  return config
78
 
 
 
 
 
 
 
 
 
 
 
79
  def get_renderer(self):
80
  match self.cli_options["output_format"]:
81
  case "json":
@@ -122,3 +137,4 @@ class ConfigParser:
122
  def get_base_filename(self, filepath: str):
123
  basename = os.path.basename(filepath)
124
  return os.path.splitext(basename)[0]
 
 
39
  fn = click.option("--languages", type=str, default=None, help="Comma separated list of languages to use for OCR.")(fn)
40
 
41
  # we put common options here
 
42
  fn = click.option("--use_llm", is_flag=True, default=False, help="Enable higher quality processing with LLMs.")(fn)
43
  fn = click.option("--converter_cls", type=str, default=None, help="Converter class to use. Defaults to PDF converter.")(fn)
44
+ fn = click.option("--llm_service", type=str, default=None, help="LLM service to use - should be full import path, like marker.services.gemini.GoogleGeminiService")(fn)
45
 
46
  # enum options
47
  fn = click.option("--force_layout_block", type=click.Choice(choices=[t.name for t in BlockTypes]), default=None,)(fn)
 
74
  case _:
75
  if k in crawler.attr_set:
76
  config[k] = v
77
+
78
+ # Backward compatibility for google_api_key
79
+ if settings.GOOGLE_API_KEY:
80
+ config["gemini_api_key"] = settings.GOOGLE_API_KEY
81
+
82
  return config
83
 
84
+ def get_llm_service(self):
85
+ # Only return an LLM service when use_llm is enabled
86
+ if not self.cli_options["use_llm"]:
87
+ return None
88
+
89
+ service_cls = self.cli_options["llm_service"]
90
+ if service_cls is None:
91
+ service_cls = "marker.services.gemini.GoogleGeminiService"
92
+ return service_cls
93
+
94
  def get_renderer(self):
95
  match self.cli_options["output_format"]:
96
  case "json":
 
137
  def get_base_filename(self, filepath: str):
138
  basename = os.path.basename(filepath)
139
  return os.path.splitext(basename)[0]
140
+
marker/converters/__init__.py CHANGED
@@ -13,6 +13,7 @@ class BaseConverter:
13
  def __init__(self, config: Optional[BaseModel | dict] = None):
14
  assign_config(self, config)
15
  self.config = config
 
16
 
17
  def __call__(self, *args, **kwargs):
18
  raise NotImplementedError
@@ -52,7 +53,8 @@ class BaseConverter:
52
 
53
  meta_processor = LLMSimpleBlockMetaProcessor(
54
  processor_lst=simple_llm_processors,
55
- config=self.config
 
56
  )
57
  other_processors.insert(insert_position, meta_processor)
58
  return other_processors
 
13
  def __init__(self, config: Optional[BaseModel | dict] = None):
14
  assign_config(self, config)
15
  self.config = config
16
+ self.llm_service = None
17
 
18
  def __call__(self, *args, **kwargs):
19
  raise NotImplementedError
 
53
 
54
  meta_processor = LLMSimpleBlockMetaProcessor(
55
  processor_lst=simple_llm_processors,
56
+ llm_service=self.llm_service,
57
+ config=self.config,
58
  )
59
  other_processors.insert(insert_position, meta_processor)
60
  return other_processors
marker/converters/pdf.py CHANGED
@@ -1,4 +1,7 @@
1
  import os
 
 
 
2
  os.environ["TOKENIZERS_PARALLELISM"] = "false" # disables a tokenizers warning
3
 
4
  import inspect
@@ -86,7 +89,14 @@ class PdfConverter(BaseConverter):
86
  DebugProcessor,
87
  )
88
 
89
- def __init__(self, artifact_dict: Dict[str, Any], processor_list: Optional[List[str]] = None, renderer: str | None = None, config=None):
 
 
 
 
 
 
 
90
  super().__init__(config)
91
 
92
  for block_type, override_block_type in self.override_map.items():
@@ -102,6 +112,14 @@ class PdfConverter(BaseConverter):
102
  else:
103
  renderer = MarkdownRenderer
104
 
 
 
 
 
 
 
 
 
105
  self.artifact_dict = artifact_dict
106
  self.renderer = renderer
107
 
 
1
  import os
2
+
3
+ from marker.services.gemini import GoogleGeminiService
4
+
5
  os.environ["TOKENIZERS_PARALLELISM"] = "false" # disables a tokenizers warning
6
 
7
  import inspect
 
89
  DebugProcessor,
90
  )
91
 
92
+ def __init__(
93
+ self,
94
+ artifact_dict: Dict[str, Any],
95
+ processor_list: Optional[List[str]] = None,
96
+ renderer: str | None = None,
97
+ llm_service: str | None = None,
98
+ config=None
99
+ ):
100
  super().__init__(config)
101
 
102
  for block_type, override_block_type in self.override_map.items():
 
112
  else:
113
  renderer = MarkdownRenderer
114
 
115
+ if llm_service:
116
+ llm_service_cls = strings_to_classes([llm_service])[0]
117
+ llm_service = self.resolve_dependencies(llm_service_cls)
118
+
119
+ # Inject llm service into artifact_dict so it can be picked up by processors, etc.
120
+ artifact_dict["llm_service"] = llm_service
121
+ self.llm_service = llm_service
122
+
123
  self.artifact_dict = artifact_dict
124
  self.renderer = renderer
125
 
marker/processors/llm/__init__.py CHANGED
@@ -8,11 +8,12 @@ from PIL import Image
8
 
9
  from marker.processors import BaseProcessor
10
  from marker.schema import BlockTypes
11
- from marker.services.google import GoogleModel
12
  from marker.schema.blocks import Block
13
  from marker.schema.document import Document
14
  from marker.schema.groups import PageGroup
 
15
  from marker.settings import settings
 
16
 
17
 
18
  class PromptData(TypedDict):
@@ -67,14 +68,14 @@ class BaseLLMProcessor(BaseProcessor):
67
  ] = False
68
  block_types = None
69
 
70
- def __init__(self, config=None):
71
  super().__init__(config)
72
 
73
- self.model = None
74
  if not self.use_llm:
75
  return
76
 
77
- self.model = GoogleModel(self.google_api_key, self.model_name)
78
 
79
  def extract_image(self, document: Document, image_block: Block, remove_blocks: Sequence[BlockTypes] | None = None) -> Image.Image:
80
  return image_block.get_image(
@@ -90,7 +91,7 @@ class BaseLLMComplexBlockProcessor(BaseLLMProcessor):
90
  A processor for using LLMs to convert blocks with more complex logic.
91
  """
92
  def __call__(self, document: Document):
93
- if not self.use_llm or self.model is None:
94
  return
95
 
96
  try:
@@ -125,6 +126,10 @@ class BaseLLMSimpleBlockProcessor(BaseLLMProcessor):
125
  A processor for using LLMs to convert single blocks.
126
  """
127
 
 
 
 
 
128
  def __call__(self, result: dict, prompt_data: PromptData, document: Document):
129
  try:
130
  self.rewrite_block(result, prompt_data, document)
 
8
 
9
  from marker.processors import BaseProcessor
10
  from marker.schema import BlockTypes
 
11
  from marker.schema.blocks import Block
12
  from marker.schema.document import Document
13
  from marker.schema.groups import PageGroup
14
+ from marker.services import BaseService
15
  from marker.settings import settings
16
+ from marker.util import assign_config
17
 
18
 
19
  class PromptData(TypedDict):
 
68
  ] = False
69
  block_types = None
70
 
71
+ def __init__(self, llm_service: BaseService, config=None):
72
  super().__init__(config)
73
 
74
+ self.llm_service = None
75
  if not self.use_llm:
76
  return
77
 
78
+ self.llm_service = llm_service
79
 
80
  def extract_image(self, document: Document, image_block: Block, remove_blocks: Sequence[BlockTypes] | None = None) -> Image.Image:
81
  return image_block.get_image(
 
91
  A processor for using LLMs to convert blocks with more complex logic.
92
  """
93
  def __call__(self, document: Document):
94
+ if not self.use_llm or self.llm_service is None:
95
  return
96
 
97
  try:
 
126
  A processor for using LLMs to convert single blocks.
127
  """
128
 
129
+ # Override init since we don't need an llmservice here
130
+ def __init__(self, config=None):
131
+ assign_config(self, config)
132
+
133
  def __call__(self, result: dict, prompt_data: PromptData, document: Document):
134
  try:
135
  self.rewrite_block(result, prompt_data, document)
marker/processors/llm/llm_meta.py CHANGED
@@ -5,18 +5,19 @@ from tqdm import tqdm
5
 
6
  from marker.processors.llm import BaseLLMSimpleBlockProcessor, BaseLLMProcessor
7
  from marker.schema.document import Document
 
8
 
9
 
10
  class LLMSimpleBlockMetaProcessor(BaseLLMProcessor):
11
  """
12
  A wrapper for simple LLM processors, so they can all run in parallel.
13
  """
14
- def __init__(self, processor_lst: List[BaseLLMSimpleBlockProcessor], config=None):
15
- super().__init__(config)
16
  self.processors = processor_lst
17
 
18
  def __call__(self, document: Document):
19
- if not self.use_llm or self.model is None:
20
  return
21
 
22
  total = sum([len(processor.inference_blocks(document)) for processor in self.processors])
@@ -50,4 +51,4 @@ class LLMSimpleBlockMetaProcessor(BaseLLMProcessor):
50
  pbar.close()
51
 
52
  def get_response(self, prompt_data: Dict[str, Any]):
53
- return self.model.generate_response(prompt_data["prompt"], prompt_data["image"], prompt_data["block"], prompt_data["schema"])
 
5
 
6
  from marker.processors.llm import BaseLLMSimpleBlockProcessor, BaseLLMProcessor
7
  from marker.schema.document import Document
8
+ from marker.services import BaseService
9
 
10
 
11
  class LLMSimpleBlockMetaProcessor(BaseLLMProcessor):
12
  """
13
  A wrapper for simple LLM processors, so they can all run in parallel.
14
  """
15
+ def __init__(self, processor_lst: List[BaseLLMSimpleBlockProcessor], llm_service: BaseService, config=None):
16
+ super().__init__(llm_service, config)
17
  self.processors = processor_lst
18
 
19
  def __call__(self, document: Document):
20
+ if not self.use_llm or self.llm_service is None:
21
  return
22
 
23
  total = sum([len(processor.inference_blocks(document)) for processor in self.processors])
 
51
  pbar.close()
52
 
53
  def get_response(self, prompt_data: Dict[str, Any]):
54
+ return self.llm_service(prompt_data["prompt"], prompt_data["image"], prompt_data["block"], prompt_data["schema"])
marker/processors/llm/llm_table.py CHANGED
@@ -134,7 +134,7 @@ No corrections needed.
134
  def rewrite_single_chunk(self, page: PageGroup, block: Block, block_html: str, children: List[TableCell], image: Image.Image):
135
  prompt = self.table_rewriting_prompt.replace("{block_html}", block_html)
136
 
137
- response = self.model.generate_response(prompt, image, block, TableSchema)
138
 
139
  if not response or "corrected_html" not in response:
140
  block.update_metadata(llm_error_count=1)
 
134
  def rewrite_single_chunk(self, page: PageGroup, block: Block, block_html: str, children: List[TableCell], image: Image.Image):
135
  prompt = self.table_rewriting_prompt.replace("{block_html}", block_html)
136
 
137
+ response = self.llm_service(prompt, image, block, TableSchema)
138
 
139
  if not response or "corrected_html" not in response:
140
  block.update_metadata(llm_error_count=1)
marker/processors/llm/llm_table_merge.py CHANGED
@@ -240,7 +240,7 @@ Table 2
240
 
241
  prompt = self.table_merge_prompt.replace("{{table1}}", start_html).replace("{{table2}}", curr_html)
242
 
243
- response = self.model.generate_response(
244
  prompt,
245
  [start_image, curr_image],
246
  curr_block,
 
240
 
241
  prompt = self.table_merge_prompt.replace("{{table1}}", start_html).replace("{{table2}}", curr_html)
242
 
243
+ response = self.llm_service(
244
  prompt,
245
  [start_image, curr_image],
246
  curr_block,
marker/scripts/convert.py CHANGED
@@ -51,7 +51,8 @@ def process_single_pdf(args):
51
  config=config_parser.generate_config_dict(),
52
  artifact_dict=model_refs,
53
  processor_list=config_parser.get_processors(),
54
- renderer=config_parser.get_renderer()
 
55
  )
56
  rendered = converter(fpath)
57
  out_folder = config_parser.get_output_folder(fpath)
 
51
  config=config_parser.generate_config_dict(),
52
  artifact_dict=model_refs,
53
  processor_list=config_parser.get_processors(),
54
+ renderer=config_parser.get_renderer(),
55
+ llm_service=config_parser.get_llm_service()
56
  )
57
  rendered = converter(fpath)
58
  out_folder = config_parser.get_output_folder(fpath)
marker/scripts/convert_single.py CHANGED
@@ -29,7 +29,8 @@ def convert_single_cli(fpath: str, **kwargs):
29
  config=config_parser.generate_config_dict(),
30
  artifact_dict=models,
31
  processor_list=config_parser.get_processors(),
32
- renderer=config_parser.get_renderer()
 
33
  )
34
  rendered = converter(fpath)
35
  out_folder = config_parser.get_output_folder(fpath)
 
29
  config=config_parser.generate_config_dict(),
30
  artifact_dict=models,
31
  processor_list=config_parser.get_processors(),
32
+ renderer=config_parser.get_renderer(),
33
+ llm_service=config_parser.get_llm_service()
34
  )
35
  rendered = converter(fpath)
36
  out_folder = config_parser.get_output_folder(fpath)
marker/scripts/server.py CHANGED
@@ -95,7 +95,8 @@ async def _convert_pdf(params: CommonParams):
95
  config=config_dict,
96
  artifact_dict=app_data["models"],
97
  processor_list=config_parser.get_processors(),
98
- renderer=config_parser.get_renderer()
 
99
  )
100
  rendered = converter(params.filepath)
101
  text, _, images = text_from_rendered(rendered)
 
95
  config=config_dict,
96
  artifact_dict=app_data["models"],
97
  processor_list=config_parser.get_processors(),
98
+ renderer=config_parser.get_renderer(),
99
+ llm_service=config_parser.get_llm_service()
100
  )
101
  rendered = converter(params.filepath)
102
  text, _, images = text_from_rendered(rendered)
marker/scripts/streamlit_app.py CHANGED
@@ -56,7 +56,8 @@ def convert_pdf(fname: str, config_parser: ConfigParser) -> (str, Dict[str, Any]
56
  config=config_dict,
57
  artifact_dict=model_dict,
58
  processor_list=config_parser.get_processors(),
59
- renderer=config_parser.get_renderer()
 
60
  )
61
  return converter(fname)
62
 
 
56
  config=config_dict,
57
  artifact_dict=model_dict,
58
  processor_list=config_parser.get_processors(),
59
+ renderer=config_parser.get_renderer(),
60
+ llm_service=config_parser.get_llm_service()
61
  )
62
  return converter(fname)
63
 
marker/services/__init__.py CHANGED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Optional, List
2
+
3
+ import PIL
4
+ from pydantic import BaseModel
5
+
6
+ from marker.schema.blocks import Block
7
+ from marker.util import assign_config, verify_config_keys
8
+
9
+
10
+ class BaseService:
11
+ def __init__(self, config: Optional[BaseModel | dict] = None):
12
+ assign_config(self, config)
13
+
14
+ # Ensure we have all necessary fields filled out (API keys, etc.)
15
+ verify_config_keys(self)
16
+
17
+ def __call__(
18
+ self,
19
+ prompt: str,
20
+ image: PIL.Image.Image | List[PIL.Image.Image],
21
+ block: Block,
22
+ response_schema: type[BaseModel],
23
+ max_retries: int = 1,
24
+ timeout: int = 15
25
+ ):
26
+ raise NotImplementedError
marker/services/{google.py → gemini.py} RENAMED
@@ -1,7 +1,7 @@
1
  import json
2
  import time
3
  from io import BytesIO
4
- from typing import List
5
 
6
  import PIL
7
  from google import genai
@@ -10,29 +10,23 @@ from google.genai.errors import APIError
10
  from pydantic import BaseModel
11
 
12
  from marker.schema.blocks import Block
13
- from marker.settings import settings
14
 
15
-
16
- class GoogleModel:
17
- def __init__(self, api_key: str, model_name: str):
18
- if api_key is None:
19
- raise ValueError("Google API key is not set")
20
-
21
- self.api_key = api_key
22
- self.model_name = model_name
23
-
24
- def get_google_client(self, timeout: int = 60):
25
- return genai.Client(
26
- api_key=settings.GOOGLE_API_KEY,
27
- http_options={"timeout": timeout * 1000} # Convert to milliseconds
28
- )
29
 
30
  def img_to_bytes(self, img: PIL.Image.Image):
31
  image_bytes = BytesIO()
32
  img.save(image_bytes, format="PNG")
33
  return image_bytes.getvalue()
34
 
35
- def generate_response(
 
 
 
36
  self,
37
  prompt: str,
38
  image: PIL.Image.Image | List[PIL.Image.Image],
@@ -51,7 +45,7 @@ class GoogleModel:
51
  while tries < max_retries:
52
  try:
53
  responses = client.models.generate_content(
54
- model="gemini-2.0-flash",
55
  contents=image_parts + [prompt], # According to gemini docs, it performs better if the image is the first element
56
  config={
57
  "temperature": 0,
@@ -78,3 +72,16 @@ class GoogleModel:
78
  break
79
 
80
  return {}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  import json
2
  import time
3
  from io import BytesIO
4
+ from typing import List, Annotated
5
 
6
  import PIL
7
  from google import genai
 
10
  from pydantic import BaseModel
11
 
12
  from marker.schema.blocks import Block
13
+ from marker.services import BaseService
14
 
15
+ class BaseGeminiService(BaseService):
16
+ gemini_model_name: Annotated[
17
+ str,
18
+ "The name of the Google model to use for the service."
19
+ ] = "gemini-2.0-flash"
 
 
 
 
 
 
 
 
 
20
 
21
  def img_to_bytes(self, img: PIL.Image.Image):
22
  image_bytes = BytesIO()
23
  img.save(image_bytes, format="PNG")
24
  return image_bytes.getvalue()
25
 
26
+ def get_google_client(self, timeout: int = 60):
27
+ raise NotImplementedError
28
+
29
+ def __call__(
30
  self,
31
  prompt: str,
32
  image: PIL.Image.Image | List[PIL.Image.Image],
 
45
  while tries < max_retries:
46
  try:
47
  responses = client.models.generate_content(
48
+ model=self.gemini_model_name,
49
  contents=image_parts + [prompt], # According to gemini docs, it performs better if the image is the first element
50
  config={
51
  "temperature": 0,
 
72
  break
73
 
74
  return {}
75
+
76
+
77
+ class GoogleGeminiService(BaseGeminiService):
78
+ gemini_api_key: Annotated[
79
+ str,
80
+ "The Google API key to use for the service."
81
+ ] = None
82
+
83
+ def get_google_client(self, timeout: int = 60):
84
+ return genai.Client(
85
+ api_key=self.gemini_api_key,
86
+ http_options={"timeout": timeout * 1000} # Convert to milliseconds
87
+ )
marker/services/ollama.py ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import base64
2
+ import json
3
+ from io import BytesIO
4
+ from typing import Annotated, List
5
+
6
+ import PIL
7
+ import requests
8
+ from pydantic import BaseModel
9
+
10
+ from marker.schema.blocks import Block
11
+ from marker.services import BaseService
12
+
13
+
14
+ class OllamaService(BaseService):
15
+ ollama_base_url: Annotated[
16
+ str,
17
+ "The base url to use for ollama. No trailing slash."
18
+ ] = "http://localhost:11434"
19
+ ollama_model: Annotated[
20
+ str,
21
+ "The model name to use for ollama."
22
+ ] = "llama3.2-vision"
23
+
24
+ def image_to_base64(self, image: PIL.Image.Image):
25
+ image_bytes = BytesIO()
26
+ image.save(image_bytes, format="PNG")
27
+ return base64.b64encode(image_bytes.getvalue()).decode("utf-8")
28
+
29
+ def __call__(
30
+ self,
31
+ prompt: str,
32
+ image: PIL.Image.Image | List[PIL.Image.Image],
33
+ block: Block,
34
+ response_schema: type[BaseModel],
35
+ max_retries: int = 1,
36
+ timeout: int = 15
37
+ ):
38
+ url = f"{self.ollama_base_url}/api/generate"
39
+ headers = {"Content-Type": "application/json"}
40
+
41
+ schema = response_schema.model_json_schema()
42
+ format_schema = {
43
+ "type": "object",
44
+ "properties": schema["properties"],
45
+ "required": schema["required"]
46
+ }
47
+
48
+ if not isinstance(image, list):
49
+ image = [image]
50
+
51
+ image_bytes = [self.image_to_base64(img) for img in image]
52
+
53
+ payload = {
54
+ "model": self.ollama_model,
55
+ "prompt": prompt,
56
+ "stream": False,
57
+ "format": format_schema,
58
+ "images": image_bytes
59
+ }
60
+
61
+ try:
62
+ response = requests.post(url, json=payload, headers=headers)
63
+ response.raise_for_status()
64
+ response_data = response.json()
65
+ data = response_data["response"]
66
+ print(data)
67
+ return json.loads(data)
68
+ except Exception as e:
69
+ print(f"Ollama inference failed: {e}")
70
+
71
+ return {}
marker/services/vertex.py ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Annotated
2
+
3
+ from google import genai
4
+
5
+ from marker.services.gemini import BaseGeminiService
6
+
7
+ class GoogleVertexService(BaseGeminiService):
8
+ vertex_project_id: Annotated[
9
+ str,
10
+ "Google Cloud Project ID for Vertex AI.",
11
+ ] = None
12
+ vertex_location: Annotated[
13
+ str,
14
+ "Google Cloud Location for Vertex AI.",
15
+ ] = None
16
+
17
+ def get_google_client(self, timeout: int = 60):
18
+ return genai.Client(
19
+ vertexai=True,
20
+ project=self.vertex_project_id,
21
+ location=self.vertex_location,
22
+ http_options={"timeout": timeout * 1000} # Convert to milliseconds
23
+ )
marker/util.py CHANGED
@@ -1,7 +1,6 @@
1
  import inspect
2
- import re
3
  from importlib import import_module
4
- from typing import List
5
 
6
  import numpy as np
7
  from pydantic import BaseModel
@@ -24,6 +23,19 @@ def classes_to_strings(items: List[type]) -> List[str]:
24
  return [f"{item.__module__}.{item.__name__}" for item in items]
25
 
26
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
  def assign_config(cls, config: BaseModel | dict | None):
28
  cls_name = cls.__class__.__name__
29
  if config is None:
 
1
  import inspect
 
2
  from importlib import import_module
3
+ from typing import List, Annotated
4
 
5
  import numpy as np
6
  from pydantic import BaseModel
 
23
  return [f"{item.__module__}.{item.__name__}" for item in items]
24
 
25
 
26
+ def verify_config_keys(obj):
27
+ annotations = inspect.get_annotations(obj.__class__)
28
+
29
+ none_vals = ""
30
+ for attr_name, annotation in annotations.items():
31
+ if isinstance(annotation, type(Annotated[str, ""])):
32
+ value = getattr(obj, attr_name)
33
+ if value is None:
34
+ none_vals += f"{attr_name}, "
35
+
36
+ assert len(none_vals) == 0, f"Missing values for {none_vals} are not allowed in {obj.__class__.__name__}."
37
+
38
+
39
  def assign_config(cls, config: BaseModel | dict | None):
40
  cls_name = cls.__class__.__name__
41
  if config is None:
tests/conftest.py CHANGED
@@ -18,6 +18,7 @@ from marker.schema.blocks import Block
18
  from marker.renderers.markdown import MarkdownRenderer
19
  from marker.renderers.json import JSONRenderer
20
  from marker.schema.registry import register_block_class
 
21
  from marker.util import classes_to_strings
22
 
23
  @pytest.fixture(scope="session")
@@ -126,6 +127,17 @@ def renderer(request, config):
126
  else:
127
  return MarkdownRenderer
128
 
 
 
 
 
 
 
 
 
 
 
 
129
  @pytest.fixture(scope="function")
130
  def temp_image():
131
  img = Image.new("RGB", (512, 512), color="white")
 
18
  from marker.renderers.markdown import MarkdownRenderer
19
  from marker.renderers.json import JSONRenderer
20
  from marker.schema.registry import register_block_class
21
+ from marker.services.gemini import GoogleGeminiService
22
  from marker.util import classes_to_strings
23
 
24
  @pytest.fixture(scope="session")
 
127
  else:
128
  return MarkdownRenderer
129
 
130
+
131
+ @pytest.fixture(scope="function")
132
+ def llm_service(request):
133
+ llm_service = GoogleGeminiService(
134
+ config={
135
+ "gemini_api_key": "test"
136
+ }
137
+ )
138
+ yield llm_service
139
+
140
+
141
  @pytest.fixture(scope="function")
142
  def temp_image():
143
  img = Image.new("RGB", (512, 512), color="white")
tests/processors/test_inline_math.py CHANGED
@@ -17,12 +17,11 @@ def test_llm_text_processor(pdf_document, mocker):
17
  corrected_lines = ["<math>Text</math>"] * len(text_lines)
18
 
19
  mock_cls = Mock()
20
- mock_cls.return_value.generate_response.return_value = {"corrected_lines": corrected_lines}
21
- mocker.patch("marker.processors.llm.GoogleModel", mock_cls)
22
 
23
- config = {"use_llm": True, "google_api_key": "test"}
24
  processor_lst = [LLMTextProcessor(config)]
25
- processor = LLMSimpleBlockMetaProcessor(processor_lst, config)
26
  processor(pdf_document)
27
 
28
  contained_spans = text_lines[0].contained_blocks(pdf_document, (BlockTypes.Span,))
 
17
  corrected_lines = ["<math>Text</math>"] * len(text_lines)
18
 
19
  mock_cls = Mock()
20
+ mock_cls.return_value = {"corrected_lines": corrected_lines}
 
21
 
22
+ config = {"use_llm": True, "gemini_api_key": "test"}
23
  processor_lst = [LLMTextProcessor(config)]
24
+ processor = LLMSimpleBlockMetaProcessor(processor_lst, mock_cls, config)
25
  processor(pdf_document)
26
 
27
  contained_spans = text_lines[0].contained_blocks(pdf_document, (BlockTypes.Span,))
tests/processors/test_llm_processors.py CHANGED
@@ -14,11 +14,12 @@ from marker.renderers.markdown import MarkdownRenderer
14
  from marker.schema import BlockTypes
15
  from marker.schema.blocks import ComplexRegion
16
 
 
17
  @pytest.mark.filename("form_1040.pdf")
18
  @pytest.mark.config({"page_range": [0]})
19
- def test_llm_form_processor_no_config(pdf_document):
20
  processor_lst = [LLMFormProcessor()]
21
- processor = LLMSimpleBlockMetaProcessor(processor_lst)
22
  processor(pdf_document)
23
 
24
  forms = pdf_document.contained_blocks((BlockTypes.Form,))
@@ -27,9 +28,10 @@ def test_llm_form_processor_no_config(pdf_document):
27
 
28
  @pytest.mark.filename("form_1040.pdf")
29
  @pytest.mark.config({"page_range": [0]})
30
- def test_llm_form_processor_no_cells(pdf_document):
31
- processor_lst = [LLMFormProcessor({"use_llm": True, "google_api_key": "test"})]
32
- processor = LLMSimpleBlockMetaProcessor(processor_lst)
 
33
  processor(pdf_document)
34
 
35
  forms = pdf_document.contained_blocks((BlockTypes.Form,))
@@ -38,20 +40,19 @@ def test_llm_form_processor_no_cells(pdf_document):
38
 
39
  @pytest.mark.filename("form_1040.pdf")
40
  @pytest.mark.config({"page_range": [0]})
41
- def test_llm_form_processor(pdf_document, detection_model, table_rec_model, recognition_model, mocker):
42
  corrected_html = "<em>This is corrected markdown.</em>\n" * 100
43
  corrected_html = "<p>" + corrected_html.strip() + "</p>\n"
44
 
45
  mock_cls = Mock()
46
- mock_cls.return_value.generate_response.return_value = {"corrected_html": corrected_html}
47
- mocker.patch("marker.processors.llm.GoogleModel", mock_cls)
48
 
49
  cell_processor = TableProcessor(detection_model, recognition_model, table_rec_model)
50
  cell_processor(pdf_document)
51
 
52
  config = {"use_llm": True, "google_api_key": "test"}
53
  processor_lst = [LLMFormProcessor(config)]
54
- processor = LLMSimpleBlockMetaProcessor(processor_lst, config)
55
  processor(pdf_document)
56
 
57
  forms = pdf_document.contained_blocks((BlockTypes.Form,))
@@ -61,7 +62,7 @@ def test_llm_form_processor(pdf_document, detection_model, table_rec_model, reco
61
 
62
  @pytest.mark.filename("table_ex2.pdf")
63
  @pytest.mark.config({"page_range": [0]})
64
- def test_llm_table_processor(pdf_document, detection_model, table_rec_model, recognition_model, mocker):
65
  corrected_html = """
66
  <table>
67
  <tr>
@@ -86,13 +87,12 @@ def test_llm_table_processor(pdf_document, detection_model, table_rec_model, rec
86
  """.strip()
87
 
88
  mock_cls = Mock()
89
- mock_cls.return_value.generate_response.return_value = {"corrected_html": corrected_html}
90
- mocker.patch("marker.processors.llm.GoogleModel", mock_cls)
91
 
92
  cell_processor = TableProcessor(detection_model, recognition_model, table_rec_model)
93
  cell_processor(pdf_document)
94
 
95
- processor = LLMTableProcessor({"use_llm": True, "google_api_key": "test"})
96
  processor(pdf_document)
97
 
98
  tables = pdf_document.contained_blocks((BlockTypes.Table,))
@@ -107,8 +107,9 @@ def test_llm_table_processor(pdf_document, detection_model, table_rec_model, rec
107
  @pytest.mark.config({"page_range": [0]})
108
  def test_llm_caption_processor_disabled(pdf_document):
109
  config = {"use_llm": True, "google_api_key": "test"}
 
110
  processor_lst = [LLMImageDescriptionProcessor(config)]
111
- processor = LLMSimpleBlockMetaProcessor(processor_lst, config)
112
  processor(pdf_document)
113
 
114
  contained_pictures = pdf_document.contained_blocks((BlockTypes.Picture, BlockTypes.Figure))
@@ -116,15 +117,14 @@ def test_llm_caption_processor_disabled(pdf_document):
116
 
117
  @pytest.mark.filename("A17_FlightPlan.pdf")
118
  @pytest.mark.config({"page_range": [0]})
119
- def test_llm_caption_processor(pdf_document, mocker):
120
  description = "This is an image description."
121
  mock_cls = Mock()
122
- mock_cls.return_value.generate_response.return_value = {"image_description": description}
123
- mocker.patch("marker.processors.llm.GoogleModel", mock_cls)
124
 
125
  config = {"use_llm": True, "google_api_key": "test", "extract_images": False}
126
  processor_lst = [LLMImageDescriptionProcessor(config)]
127
- processor = LLMSimpleBlockMetaProcessor(processor_lst, config)
128
  processor(pdf_document)
129
 
130
  contained_pictures = pdf_document.contained_blocks((BlockTypes.Picture, BlockTypes.Figure))
@@ -139,11 +139,10 @@ def test_llm_caption_processor(pdf_document, mocker):
139
 
140
  @pytest.mark.filename("A17_FlightPlan.pdf")
141
  @pytest.mark.config({"page_range": [0]})
142
- def test_llm_complex_region_processor(pdf_document, mocker):
143
  md = "This is some *markdown* for a complex region."
144
  mock_cls = Mock()
145
- mock_cls.return_value.generate_response.return_value = {"corrected_markdown": md * 25}
146
- mocker.patch("marker.processors.llm.GoogleModel", mock_cls)
147
 
148
  # Replace the block with a complex region
149
  old_block = pdf_document.pages[0].children[0]
@@ -155,7 +154,7 @@ def test_llm_complex_region_processor(pdf_document, mocker):
155
  # Test processor
156
  config = {"use_llm": True, "google_api_key": "test"}
157
  processor_lst = [LLMComplexRegionProcessor(config)]
158
- processor = LLMSimpleBlockMetaProcessor(processor_lst, config)
159
  processor(pdf_document)
160
 
161
  # Ensure the rendering includes the description
@@ -166,15 +165,14 @@ def test_llm_complex_region_processor(pdf_document, mocker):
166
 
167
  @pytest.mark.filename("adversarial.pdf")
168
  @pytest.mark.config({"page_range": [0]})
169
- def test_multi_llm_processors(pdf_document, mocker):
170
  description = "<math>This is an image description. And here is a lot of writing about it.</math>" * 10
171
  mock_cls = Mock()
172
- mock_cls.return_value.generate_response.return_value = {"image_description": description, "html_equation": description}
173
- mocker.patch("marker.processors.llm.GoogleModel", mock_cls)
174
 
175
  config = {"use_llm": True, "google_api_key": "test", "extract_images": False, "min_equation_height": .001}
176
  processor_lst = [LLMImageDescriptionProcessor(config), LLMEquationProcessor(config)]
177
- processor = LLMSimpleBlockMetaProcessor(processor_lst, config)
178
  processor(pdf_document)
179
 
180
  contained_pictures = pdf_document.contained_blocks((BlockTypes.Picture, BlockTypes.Figure))
 
14
  from marker.schema import BlockTypes
15
  from marker.schema.blocks import ComplexRegion
16
 
17
+
18
  @pytest.mark.filename("form_1040.pdf")
19
  @pytest.mark.config({"page_range": [0]})
20
+ def test_llm_form_processor_no_config(pdf_document, llm_service):
21
  processor_lst = [LLMFormProcessor()]
22
+ processor = LLMSimpleBlockMetaProcessor(processor_lst, llm_service)
23
  processor(pdf_document)
24
 
25
  forms = pdf_document.contained_blocks((BlockTypes.Form,))
 
28
 
29
  @pytest.mark.filename("form_1040.pdf")
30
  @pytest.mark.config({"page_range": [0]})
31
+ def test_llm_form_processor_no_cells(pdf_document, llm_service):
32
+ config = {"use_llm": True, "google_api_key": "test"}
33
+ processor_lst = [LLMFormProcessor(config)]
34
+ processor = LLMSimpleBlockMetaProcessor(processor_lst, llm_service, config)
35
  processor(pdf_document)
36
 
37
  forms = pdf_document.contained_blocks((BlockTypes.Form,))
 
40
 
41
  @pytest.mark.filename("form_1040.pdf")
42
  @pytest.mark.config({"page_range": [0]})
43
+ def test_llm_form_processor(pdf_document, detection_model, table_rec_model, recognition_model):
44
  corrected_html = "<em>This is corrected markdown.</em>\n" * 100
45
  corrected_html = "<p>" + corrected_html.strip() + "</p>\n"
46
 
47
  mock_cls = Mock()
48
+ mock_cls.return_value = {"corrected_html": corrected_html}
 
49
 
50
  cell_processor = TableProcessor(detection_model, recognition_model, table_rec_model)
51
  cell_processor(pdf_document)
52
 
53
  config = {"use_llm": True, "google_api_key": "test"}
54
  processor_lst = [LLMFormProcessor(config)]
55
+ processor = LLMSimpleBlockMetaProcessor(processor_lst, mock_cls, config)
56
  processor(pdf_document)
57
 
58
  forms = pdf_document.contained_blocks((BlockTypes.Form,))
 
62
 
63
  @pytest.mark.filename("table_ex2.pdf")
64
  @pytest.mark.config({"page_range": [0]})
65
+ def test_llm_table_processor(pdf_document, detection_model, table_rec_model, recognition_model):
66
  corrected_html = """
67
  <table>
68
  <tr>
 
87
  """.strip()
88
 
89
  mock_cls = Mock()
90
+ mock_cls.return_value = {"corrected_html": corrected_html}
 
91
 
92
  cell_processor = TableProcessor(detection_model, recognition_model, table_rec_model)
93
  cell_processor(pdf_document)
94
 
95
+ processor = LLMTableProcessor(mock_cls, {"use_llm": True, "google_api_key": "test"})
96
  processor(pdf_document)
97
 
98
  tables = pdf_document.contained_blocks((BlockTypes.Table,))
 
107
  @pytest.mark.config({"page_range": [0]})
108
  def test_llm_caption_processor_disabled(pdf_document):
109
  config = {"use_llm": True, "google_api_key": "test"}
110
+ mock_cls = MagicMock()
111
  processor_lst = [LLMImageDescriptionProcessor(config)]
112
+ processor = LLMSimpleBlockMetaProcessor(processor_lst, mock_cls, config)
113
  processor(pdf_document)
114
 
115
  contained_pictures = pdf_document.contained_blocks((BlockTypes.Picture, BlockTypes.Figure))
 
117
 
118
  @pytest.mark.filename("A17_FlightPlan.pdf")
119
  @pytest.mark.config({"page_range": [0]})
120
+ def test_llm_caption_processor(pdf_document):
121
  description = "This is an image description."
122
  mock_cls = Mock()
123
+ mock_cls.return_value = {"image_description": description}
 
124
 
125
  config = {"use_llm": True, "google_api_key": "test", "extract_images": False}
126
  processor_lst = [LLMImageDescriptionProcessor(config)]
127
+ processor = LLMSimpleBlockMetaProcessor(processor_lst, mock_cls, config)
128
  processor(pdf_document)
129
 
130
  contained_pictures = pdf_document.contained_blocks((BlockTypes.Picture, BlockTypes.Figure))
 
139
 
140
  @pytest.mark.filename("A17_FlightPlan.pdf")
141
  @pytest.mark.config({"page_range": [0]})
142
+ def test_llm_complex_region_processor(pdf_document):
143
  md = "This is some *markdown* for a complex region."
144
  mock_cls = Mock()
145
+ mock_cls.return_value = {"corrected_markdown": md * 25}
 
146
 
147
  # Replace the block with a complex region
148
  old_block = pdf_document.pages[0].children[0]
 
154
  # Test processor
155
  config = {"use_llm": True, "google_api_key": "test"}
156
  processor_lst = [LLMComplexRegionProcessor(config)]
157
+ processor = LLMSimpleBlockMetaProcessor(processor_lst, mock_cls, config)
158
  processor(pdf_document)
159
 
160
  # Ensure the rendering includes the description
 
165
 
166
  @pytest.mark.filename("adversarial.pdf")
167
  @pytest.mark.config({"page_range": [0]})
168
+ def test_multi_llm_processors(pdf_document):
169
  description = "<math>This is an image description. And here is a lot of writing about it.</math>" * 10
170
  mock_cls = Mock()
171
+ mock_cls.return_value = {"image_description": description, "html_equation": description}
 
172
 
173
  config = {"use_llm": True, "google_api_key": "test", "extract_images": False, "min_equation_height": .001}
174
  processor_lst = [LLMImageDescriptionProcessor(config), LLMEquationProcessor(config)]
175
+ processor = LLMSimpleBlockMetaProcessor(processor_lst, mock_cls, config)
176
  processor(pdf_document)
177
 
178
  contained_pictures = pdf_document.contained_blocks((BlockTypes.Picture, BlockTypes.Figure))
tests/processors/test_table_merge.py CHANGED
@@ -10,11 +10,10 @@ from marker.schema import BlockTypes
10
  @pytest.mark.filename("table_ex2.pdf")
11
  def test_llm_table_processor_nomerge(pdf_document, detection_model, table_rec_model, recognition_model, mocker):
12
  mock_cls = Mock()
13
- mock_cls.return_value.generate_response.return_value = {
14
  "merge": "true",
15
  "direction": "right"
16
  }
17
- mocker.patch("marker.processors.llm.GoogleModel", mock_cls)
18
 
19
  cell_processor = TableProcessor(detection_model, recognition_model, table_rec_model)
20
  cell_processor(pdf_document)
@@ -22,7 +21,7 @@ def test_llm_table_processor_nomerge(pdf_document, detection_model, table_rec_mo
22
  tables = pdf_document.contained_blocks((BlockTypes.Table,))
23
  assert len(tables) == 3
24
 
25
- processor = LLMTableMergeProcessor({"use_llm": True, "google_api_key": "test"})
26
  processor(pdf_document)
27
 
28
  tables = pdf_document.contained_blocks((BlockTypes.Table,))
 
10
  @pytest.mark.filename("table_ex2.pdf")
11
  def test_llm_table_processor_nomerge(pdf_document, detection_model, table_rec_model, recognition_model, mocker):
12
  mock_cls = Mock()
13
+ mock_cls.return_value = {
14
  "merge": "true",
15
  "direction": "right"
16
  }
 
17
 
18
  cell_processor = TableProcessor(detection_model, recognition_model, table_rec_model)
19
  cell_processor(pdf_document)
 
21
  tables = pdf_document.contained_blocks((BlockTypes.Table,))
22
  assert len(tables) == 3
23
 
24
+ processor = LLMTableMergeProcessor(mock_cls, {"use_llm": True, "google_api_key": "test"})
25
  processor(pdf_document)
26
 
27
  tables = pdf_document.contained_blocks((BlockTypes.Table,))