Vik Paruchuri commited on
Commit
18e797e
·
1 Parent(s): 6fa9fe6

Initial table integration

Browse files
.github/workflows/tests.yml CHANGED
@@ -29,10 +29,6 @@ jobs:
29
  run: |
30
  poetry run python benchmarks/overall.py benchmark_data/pdfs benchmark_data/references report.json
31
  poetry run python scripts/verify_benchmark_scores.py report.json --type marker
32
- - name: Run table benchmark
33
- run: |
34
- poetry run python benchmarks/table.py tables.json
35
- poetry run python scripts/verify_benchmark_scores.py tables.json --type table
36
 
37
 
38
 
 
29
  run: |
30
  poetry run python benchmarks/overall.py benchmark_data/pdfs benchmark_data/references report.json
31
  poetry run python scripts/verify_benchmark_scores.py report.json --type marker
 
 
 
 
32
 
33
 
34
 
CLA.md CHANGED
@@ -1,6 +1,6 @@
1
  Marker Contributor Agreement
2
 
3
- This Marker Contributor Agreement ("MCA") applies to any contribution that you make to any product or project managed by us (the "project"), and sets out the intellectual property rights you grant to us in the contributed materials. The term "us" shall mean Vikas Paruchuri. The term "you" shall mean the person or entity identified below.
4
 
5
  If you agree to be bound by these terms, sign by writing "I have read the CLA document and I hereby sign the CLA" in response to the CLA bot Github comment. Read this agreement carefully before signing. These terms and conditions constitute a binding legal agreement.
6
 
@@ -20,5 +20,5 @@ If you or your affiliates institute patent litigation against any entity (includ
20
  - each contribution that you submit is and shall be an original work of authorship and you can legally grant the rights set out in this MCA;
21
  - to the best of your knowledge, each contribution will not violate any third party's copyrights, trademarks, patents, or other intellectual property rights; and
22
  - each contribution shall be in compliance with U.S. export control laws and other applicable export and import laws.
23
- You agree to notify us if you become aware of any circumstance which would make any of the foregoing representations inaccurate in any respect. Vikas Paruchuri may publicly disclose your participation in the project, including the fact that you have signed the MCA.
24
  6. This MCA is governed by the laws of the State of California and applicable U.S. Federal law. Any choice of law rules will not apply.
 
1
  Marker Contributor Agreement
2
 
3
+ This Marker Contributor Agreement ("MCA") applies to any contribution that you make to any product or project managed by us (the "project"), and sets out the intellectual property rights you grant to us in the contributed materials. The term "us" shall mean Endless Labs, Inc. The term "you" shall mean the person or entity identified below.
4
 
5
  If you agree to be bound by these terms, sign by writing "I have read the CLA document and I hereby sign the CLA" in response to the CLA bot Github comment. Read this agreement carefully before signing. These terms and conditions constitute a binding legal agreement.
6
 
 
20
  - each contribution that you submit is and shall be an original work of authorship and you can legally grant the rights set out in this MCA;
21
  - to the best of your knowledge, each contribution will not violate any third party's copyrights, trademarks, patents, or other intellectual property rights; and
22
  - each contribution shall be in compliance with U.S. export control laws and other applicable export and import laws.
23
+ You agree to notify us if you become aware of any circumstance which would make any of the foregoing representations inaccurate in any respect. Endless Labs, Inc. may publicly disclose your participation in the project, including the fact that you have signed the MCA.
24
  6. This MCA is governed by the laws of the State of California and applicable U.S. Federal law. Any choice of law rules will not apply.
README.md CHANGED
@@ -42,7 +42,7 @@ See [below](#benchmarks) for detailed speed and accuracy benchmarks, and instruc
42
 
43
  I want marker to be as widely accessible as possible, while still funding my development/training costs. Research and personal usage is always okay, but there are some restrictions on commercial usage.
44
 
45
- The weights for the models are licensed `cc-by-nc-sa-4.0`, but I will waive that for any organization under $5M USD in gross revenue in the most recent 12-month period AND under $5M in lifetime VC/angel funding raised. If you want to remove the GPL license requirements (dual-license) and/or use the weights commercially over the revenue limit, check out the options [here](https://www.datalab.to).
46
 
47
  # Hosted API
48
 
@@ -217,14 +217,6 @@ This will benchmark marker against other text extraction methods. It sets up ba
217
 
218
  Omit `--nougat` to exclude nougat from the benchmark. I don't recommend running nougat on CPU, since it is very slow.
219
 
220
- ### Table benchmark
221
-
222
- There is a benchmark for table parsing, which you can run with:
223
-
224
- ```shell
225
- python benchmarks/table.py test_data/tables.json
226
- ```
227
-
228
  # Thanks
229
 
230
  This work would not have been possible without amazing open source models and datasets, including (but not limited to):
@@ -233,6 +225,5 @@ This work would not have been possible without amazing open source models and da
233
  - Texify
234
  - Pypdfium2/pdfium
235
  - DocLayNet from IBM
236
- - ByT5 from Google
237
 
238
  Thank you to the authors of these models and datasets for making them available to the community!
 
42
 
43
  I want marker to be as widely accessible as possible, while still funding my development/training costs. Research and personal usage is always okay, but there are some restrictions on commercial usage.
44
 
45
+ The weights for the models are licensed `cc-by-nc-sa-4.0`, but I will waive that for any organization under $5M USD in gross revenue in the most recent 12-month period AND under $5M in lifetime VC/angel funding raised. You also must not be competitive with the [Datalab API](https://www.datalab.to/). If you want to remove the GPL license requirements (dual-license) and/or use the weights commercially over the revenue limit, check out the options [here](https://www.datalab.to).
46
 
47
  # Hosted API
48
 
 
217
 
218
  Omit `--nougat` to exclude nougat from the benchmark. I don't recommend running nougat on CPU, since it is very slow.
219
 
 
 
 
 
 
 
 
 
220
  # Thanks
221
 
222
  This work would not have been possible without amazing open source models and datasets, including (but not limited to):
 
225
  - Texify
226
  - Pypdfium2/pdfium
227
  - DocLayNet from IBM
 
228
 
229
  Thank you to the authors of these models and datasets for making them available to the community!
benchmarks/table.py DELETED
@@ -1,77 +0,0 @@
1
- import argparse
2
- import json
3
-
4
- import datasets
5
- from surya.schema import LayoutResult, LayoutBox
6
- from tqdm import tqdm
7
-
8
- from marker.benchmark.table import score_table
9
- from marker.schema.bbox import rescale_bbox
10
- from marker.schema.page import Page
11
- from marker.tables.table import format_tables
12
-
13
-
14
-
15
- def main():
16
- parser = argparse.ArgumentParser(description="Benchmark table conversion.")
17
- parser.add_argument("out_file", help="Output filename for results")
18
- parser.add_argument("--dataset", type=str, help="Dataset to use", default="vikp/table_bench")
19
- args = parser.parse_args()
20
-
21
- ds = datasets.load_dataset(args.dataset, split="train")
22
-
23
- results = []
24
- for i in tqdm(range(len(ds)), desc="Evaluating tables"):
25
- row = ds[i]
26
- marker_page = Page(**json.loads(row["marker_page"]))
27
- table_bbox = row["table_bbox"]
28
- gpt4_table = json.loads(row["gpt_4_table"])["markdown_table"]
29
-
30
- # Counterclockwise polygon from top left
31
- table_poly = [
32
- [table_bbox[0], table_bbox[1]],
33
- [table_bbox[2], table_bbox[1]],
34
- [table_bbox[2], table_bbox[3]],
35
- [table_bbox[0], table_bbox[3]],
36
- ]
37
-
38
- # Remove all other tables from the layout results
39
- layout_result = LayoutResult(
40
- bboxes=[
41
- LayoutBox(
42
- label="Table",
43
- polygon=table_poly
44
- )
45
- ],
46
- segmentation_map="",
47
- image_bbox=marker_page.text_lines.image_bbox
48
- )
49
-
50
- marker_page.layout = layout_result
51
- format_tables([marker_page])
52
-
53
- table_blocks = [block for block in marker_page.blocks if block.block_type == "Table"]
54
- if len(table_blocks) != 1:
55
- continue
56
-
57
- table_block = table_blocks[0]
58
- table_md = table_block.lines[0].spans[0].text
59
-
60
- results.append({
61
- "score": score_table(table_md, gpt4_table),
62
- "arxiv_id": row["arxiv_id"],
63
- "page_idx": row["page_idx"],
64
- "marker_table": table_md,
65
- "gpt4_table": gpt4_table,
66
- "table_bbox": table_bbox
67
- })
68
-
69
- avg_score = sum([r["score"] for r in results]) / len(results)
70
- print(f"Evaluated {len(results)} tables, average score is {avg_score}.")
71
-
72
- with open(args.out_file, "w+") as f:
73
- json.dump(results, f, indent=2)
74
-
75
-
76
- if __name__ == "__main__":
77
- main()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
marker/convert.py CHANGED
@@ -20,7 +20,6 @@ from marker.pdf.extract_text import get_text_blocks
20
  from marker.cleaners.headers import filter_header_footer, filter_common_titles
21
  from marker.equations.equations import replace_equations
22
  from marker.pdf.utils import find_filetype
23
- from marker.postprocessors.editor import edit_full_text
24
  from marker.cleaners.code import identify_code_blocks, indent_blocks
25
  from marker.cleaners.bullets import replace_bullets
26
  from marker.cleaners.headings import split_heading_blocks
@@ -83,7 +82,7 @@ def convert_single_pdf(
83
  doc.del_page(0)
84
 
85
  # Unpack models from list
86
- texify_model, layout_model, order_model, edit_model, detection_model, ocr_model = model_lst
87
 
88
  # Identify text lines on pages
89
  surya_detection(doc, pages, detection_model, batch_multiplier=batch_multiplier)
@@ -123,7 +122,7 @@ def convert_single_pdf(
123
  indent_blocks(pages)
124
 
125
  # Fix table blocks
126
- table_count = format_tables(pages)
127
  out_meta["block_stats"]["table"] = table_count
128
 
129
  for page in pages:
@@ -160,14 +159,6 @@ def convert_single_pdf(
160
  # Replace bullet characters with a -
161
  full_text = replace_bullets(full_text)
162
 
163
- # Postprocess text with editor model
164
- full_text, edit_stats = edit_full_text(
165
- full_text,
166
- edit_model,
167
- batch_multiplier=batch_multiplier
168
- )
169
- flush_cuda_memory()
170
- out_meta["postprocess_stats"] = {"edit": edit_stats}
171
  doc_images = images_to_dict(pages)
172
 
173
  return full_text, doc_images, out_meta
 
20
  from marker.cleaners.headers import filter_header_footer, filter_common_titles
21
  from marker.equations.equations import replace_equations
22
  from marker.pdf.utils import find_filetype
 
23
  from marker.cleaners.code import identify_code_blocks, indent_blocks
24
  from marker.cleaners.bullets import replace_bullets
25
  from marker.cleaners.headings import split_heading_blocks
 
82
  doc.del_page(0)
83
 
84
  # Unpack models from list
85
+ texify_model, layout_model, order_model, detection_model, ocr_model, table_rec_model = model_lst
86
 
87
  # Identify text lines on pages
88
  surya_detection(doc, pages, detection_model, batch_multiplier=batch_multiplier)
 
122
  indent_blocks(pages)
123
 
124
  # Fix table blocks
125
+ table_count = format_tables(pages, doc, fname, detection_model, table_rec_model, ocr_model)
126
  out_meta["block_stats"]["table"] = table_count
127
 
128
  for page in pages:
 
159
  # Replace bullet characters with a -
160
  full_text = replace_bullets(full_text)
161
 
 
 
 
 
 
 
 
 
162
  doc_images = images_to_dict(pages)
163
 
164
  return full_text, doc_images, out_meta
marker/models.py CHANGED
@@ -2,7 +2,6 @@ import os
2
  os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1" # For some reason, transformers decided to use .isin for a simple op, which is not supported on MPS
3
 
4
 
5
- from marker.postprocessors.editor import load_editing_model
6
  from surya.model.detection.model import load_model as load_detection_model, load_processor as load_detection_processor
7
  from texify.model.model import load_model as load_texify_model
8
  from texify.model.processor import load_processor as load_texify_processor
@@ -11,6 +10,17 @@ from surya.model.recognition.model import load_model as load_recognition_model
11
  from surya.model.recognition.processor import load_processor as load_recognition_processor
12
  from surya.model.ordering.model import load_model as load_order_model
13
  from surya.model.ordering.processor import load_processor as load_order_processor
 
 
 
 
 
 
 
 
 
 
 
14
 
15
 
16
  def setup_recognition_model(device=None, dtype=None):
@@ -18,8 +28,7 @@ def setup_recognition_model(device=None, dtype=None):
18
  rec_model = load_recognition_model(device=device, dtype=dtype)
19
  else:
20
  rec_model = load_recognition_model()
21
- rec_processor = load_recognition_processor()
22
- rec_model.processor = rec_processor
23
  return rec_model
24
 
25
 
@@ -28,9 +37,7 @@ def setup_detection_model(device=None, dtype=None):
28
  model = load_detection_model(device=device, dtype=dtype)
29
  else:
30
  model = load_detection_model()
31
-
32
- processor = load_detection_processor()
33
- model.processor = processor
34
  return model
35
 
36
 
@@ -39,8 +46,7 @@ def setup_texify_model(device=None, dtype=None):
39
  texify_model = load_texify_model(checkpoint=settings.TEXIFY_MODEL_NAME, device=device, dtype=dtype)
40
  else:
41
  texify_model = load_texify_model(checkpoint=settings.TEXIFY_MODEL_NAME, device=settings.TORCH_DEVICE_MODEL, dtype=settings.TEXIFY_DTYPE)
42
- texify_processor = load_texify_processor()
43
- texify_model.processor = texify_processor
44
  return texify_model
45
 
46
 
@@ -49,8 +55,7 @@ def setup_layout_model(device=None, dtype=None):
49
  model = load_detection_model(checkpoint=settings.LAYOUT_MODEL_CHECKPOINT, device=device, dtype=dtype)
50
  else:
51
  model = load_detection_model(checkpoint=settings.LAYOUT_MODEL_CHECKPOINT)
52
- processor = load_detection_processor(checkpoint=settings.LAYOUT_MODEL_CHECKPOINT)
53
- model.processor = processor
54
  return model
55
 
56
 
@@ -59,12 +64,11 @@ def setup_order_model(device=None, dtype=None):
59
  model = load_order_model(device=device, dtype=dtype)
60
  else:
61
  model = load_order_model()
62
- processor = load_order_processor()
63
- model.processor = processor
64
  return model
65
 
66
 
67
- def load_all_models(device=None, dtype=None, force_load_ocr=False):
68
  if device is not None:
69
  assert dtype is not None, "Must provide dtype if device is provided"
70
 
@@ -72,10 +76,10 @@ def load_all_models(device=None, dtype=None, force_load_ocr=False):
72
  detection = setup_detection_model(device, dtype)
73
  layout = setup_layout_model(device, dtype)
74
  order = setup_order_model(device, dtype)
75
- edit = load_editing_model(device, dtype)
76
 
77
  # Only load recognition model if we'll need it for all pdfs
78
  ocr = setup_recognition_model(device, dtype)
79
  texify = setup_texify_model(device, dtype)
80
- model_lst = [texify, layout, order, edit, detection, ocr]
 
81
  return model_lst
 
2
  os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1" # For some reason, transformers decided to use .isin for a simple op, which is not supported on MPS
3
 
4
 
 
5
  from surya.model.detection.model import load_model as load_detection_model, load_processor as load_detection_processor
6
  from texify.model.model import load_model as load_texify_model
7
  from texify.model.processor import load_processor as load_texify_processor
 
10
  from surya.model.recognition.processor import load_processor as load_recognition_processor
11
  from surya.model.ordering.model import load_model as load_order_model
12
  from surya.model.ordering.processor import load_processor as load_order_processor
13
+ from surya.model.table_rec.model import load_model as load_table_model
14
+ from surya.model.table_rec.processor import load_processor as load_table_processor
15
+
16
+
17
+ def setup_table_rec_model(device=None, dtype=None):
18
+ if device:
19
+ table_model = load_table_model(device=device, dtype=dtype)
20
+ else:
21
+ table_model = load_table_model()
22
+ table_model.processor = load_table_processor()
23
+ return table_model
24
 
25
 
26
  def setup_recognition_model(device=None, dtype=None):
 
28
  rec_model = load_recognition_model(device=device, dtype=dtype)
29
  else:
30
  rec_model = load_recognition_model()
31
+ rec_model.processor = load_recognition_processor()
 
32
  return rec_model
33
 
34
 
 
37
  model = load_detection_model(device=device, dtype=dtype)
38
  else:
39
  model = load_detection_model()
40
+ model.processor = load_detection_processor()
 
 
41
  return model
42
 
43
 
 
46
  texify_model = load_texify_model(checkpoint=settings.TEXIFY_MODEL_NAME, device=device, dtype=dtype)
47
  else:
48
  texify_model = load_texify_model(checkpoint=settings.TEXIFY_MODEL_NAME, device=settings.TORCH_DEVICE_MODEL, dtype=settings.TEXIFY_DTYPE)
49
+ texify_model.processor = load_texify_processor()
 
50
  return texify_model
51
 
52
 
 
55
  model = load_detection_model(checkpoint=settings.LAYOUT_MODEL_CHECKPOINT, device=device, dtype=dtype)
56
  else:
57
  model = load_detection_model(checkpoint=settings.LAYOUT_MODEL_CHECKPOINT)
58
+ model.processor = load_detection_processor(checkpoint=settings.LAYOUT_MODEL_CHECKPOINT)
 
59
  return model
60
 
61
 
 
64
  model = load_order_model(device=device, dtype=dtype)
65
  else:
66
  model = load_order_model()
67
+ model.processor = load_order_processor()
 
68
  return model
69
 
70
 
71
+ def load_all_models(device=None, dtype=None):
72
  if device is not None:
73
  assert dtype is not None, "Must provide dtype if device is provided"
74
 
 
76
  detection = setup_detection_model(device, dtype)
77
  layout = setup_layout_model(device, dtype)
78
  order = setup_order_model(device, dtype)
 
79
 
80
  # Only load recognition model if we'll need it for all pdfs
81
  ocr = setup_recognition_model(device, dtype)
82
  texify = setup_texify_model(device, dtype)
83
+ table_model = setup_table_rec_model(device, dtype)
84
+ model_lst = [texify, layout, order, detection, ocr, table_model]
85
  return model_lst
marker/ocr/recognition.py CHANGED
@@ -65,7 +65,10 @@ def run_ocr(doc, pages: List[Page], langs: List[str], rec_model, batch_multiplie
65
 
66
 
67
  def surya_recognition(doc, page_idxs, langs: List[str], rec_model, pages: List[Page], batch_multiplier=1) -> List[Optional[Page]]:
 
68
  images = [render_image(doc[pnum], dpi=settings.SURYA_OCR_DPI) for pnum in page_idxs]
 
 
69
  processor = rec_model.processor
70
  selected_pages = [p for i, p in enumerate(pages) if i in page_idxs]
71
 
@@ -73,6 +76,12 @@ def surya_recognition(doc, page_idxs, langs: List[str], rec_model, pages: List[P
73
  detection_results = [p.text_lines.bboxes for p in selected_pages]
74
  polygons = [[b.polygon for b in bboxes] for bboxes in detection_results]
75
 
 
 
 
 
 
 
76
  results = run_recognition(images, surya_langs, rec_model, processor, polygons=polygons, batch_size=int(get_batch_size() * batch_multiplier))
77
 
78
  new_pages = []
@@ -81,14 +90,15 @@ def surya_recognition(doc, page_idxs, langs: List[str], rec_model, pages: List[P
81
  ocr_results = result.text_lines
82
  blocks = []
83
  for i, line in enumerate(ocr_results):
 
84
  block = Block(
85
- bbox=line.bbox,
86
  pnum=page_idx,
87
  lines=[Line(
88
- bbox=line.bbox,
89
  spans=[Span(
90
  text=line.text,
91
- bbox=line.bbox,
92
  span_id=f"{page_idx}_{i}",
93
  font="",
94
  font_weight=0,
@@ -98,10 +108,11 @@ def surya_recognition(doc, page_idxs, langs: List[str], rec_model, pages: List[P
98
  )]
99
  )
100
  blocks.append(block)
 
101
  page = Page(
102
  blocks=blocks,
103
  pnum=page_idx,
104
- bbox=result.image_bbox,
105
  rotation=0,
106
  text_lines=text_lines,
107
  ocr_method="surya"
 
65
 
66
 
67
  def surya_recognition(doc, page_idxs, langs: List[str], rec_model, pages: List[Page], batch_multiplier=1) -> List[Optional[Page]]:
68
+ # Slice images in higher resolution than detection happened in
69
  images = [render_image(doc[pnum], dpi=settings.SURYA_OCR_DPI) for pnum in page_idxs]
70
+ box_scale = settings.SURYA_OCR_DPI / settings.SURYA_DETECTOR_DPI
71
+
72
  processor = rec_model.processor
73
  selected_pages = [p for i, p in enumerate(pages) if i in page_idxs]
74
 
 
76
  detection_results = [p.text_lines.bboxes for p in selected_pages]
77
  polygons = [[b.polygon for b in bboxes] for bboxes in detection_results]
78
 
79
+ # Scale polygons to get correct image slices
80
+ for poly in polygons:
81
+ for p in poly:
82
+ for i in range(len(p)):
83
+ p[i] = [int(p[i][0] * box_scale), int(p[i][1] * box_scale)]
84
+
85
  results = run_recognition(images, surya_langs, rec_model, processor, polygons=polygons, batch_size=int(get_batch_size() * batch_multiplier))
86
 
87
  new_pages = []
 
90
  ocr_results = result.text_lines
91
  blocks = []
92
  for i, line in enumerate(ocr_results):
93
+ scaled_bbox = [b / box_scale for b in line.bbox]
94
  block = Block(
95
+ bbox=scaled_bbox,
96
  pnum=page_idx,
97
  lines=[Line(
98
+ bbox=scaled_bbox,
99
  spans=[Span(
100
  text=line.text,
101
+ bbox=scaled_bbox,
102
  span_id=f"{page_idx}_{i}",
103
  font="",
104
  font_weight=0,
 
108
  )]
109
  )
110
  blocks.append(block)
111
+ scaled_image_bbox = [b / box_scale for b in result.image_bbox]
112
  page = Page(
113
  blocks=blocks,
114
  pnum=page_idx,
115
+ bbox=scaled_image_bbox,
116
  rotation=0,
117
  text_lines=text_lines,
118
  ocr_method="surya"
marker/pdf/extract_text.py CHANGED
@@ -90,7 +90,7 @@ def get_text_blocks(doc, fname, max_pages: Optional[int] = None, start_page: Opt
90
 
91
  page_range = range(start_page, start_page + max_pages)
92
 
93
- char_blocks = dictionary_output(fname, page_range=page_range, keep_chars=True, workers=settings.PDFTEXT_CPU_WORKERS)
94
  marker_blocks = [pdftext_format_to_blocks(page, pnum) for pnum, page in enumerate(char_blocks)]
95
 
96
  return marker_blocks, toc
 
90
 
91
  page_range = range(start_page, start_page + max_pages)
92
 
93
+ char_blocks = dictionary_output(fname, page_range=page_range, keep_chars=False, workers=settings.PDFTEXT_CPU_WORKERS)
94
  marker_blocks = [pdftext_format_to_blocks(page, pnum) for pnum, page in enumerate(char_blocks)]
95
 
96
  return marker_blocks, toc
marker/postprocessors/editor.py DELETED
@@ -1,123 +0,0 @@
1
- from collections import defaultdict
2
- from itertools import chain
3
- from typing import Optional
4
-
5
- from marker.settings import settings
6
- import torch
7
- import torch.nn.functional as F
8
- from marker.postprocessors.t5 import T5ForTokenClassification, byt5_tokenize
9
-
10
-
11
- def get_batch_size():
12
- if settings.EDITOR_BATCH_SIZE is not None:
13
- return settings.EDITOR_BATCH_SIZE
14
- elif settings.TORCH_DEVICE_MODEL == "cuda":
15
- return 12
16
- return 6
17
-
18
-
19
- def load_editing_model(device=None, dtype=None):
20
- if not settings.ENABLE_EDITOR_MODEL:
21
- return None
22
-
23
- if device:
24
- model = T5ForTokenClassification.from_pretrained(
25
- settings.EDITOR_MODEL_NAME,
26
- torch_dtype=dtype,
27
- device=device,
28
- )
29
- else:
30
- model = T5ForTokenClassification.from_pretrained(
31
- settings.EDITOR_MODEL_NAME,
32
- torch_dtype=settings.MODEL_DTYPE,
33
- ).to(settings.TORCH_DEVICE_MODEL)
34
- model.eval()
35
-
36
- model.config.label2id = {
37
- "equal": 0,
38
- "delete": 1,
39
- "newline-1": 2,
40
- "space-1": 3,
41
- }
42
- model.config.id2label = {v: k for k, v in model.config.label2id.items()}
43
- return model
44
-
45
-
46
- def edit_full_text(text: str, model: Optional[T5ForTokenClassification], batch_multiplier=1) -> (str, dict):
47
- if not model:
48
- return text, {}
49
-
50
- batch_size = get_batch_size() * batch_multiplier
51
- tokenized = byt5_tokenize(text, settings.EDITOR_MAX_LENGTH)
52
- input_ids = tokenized["input_ids"]
53
- char_token_lengths = tokenized["char_token_lengths"]
54
-
55
- # Run model
56
- token_masks = []
57
- for i in range(0, len(input_ids), batch_size):
58
- batch_input_ids = tokenized["input_ids"][i: i + batch_size]
59
- batch_input_ids = torch.tensor(batch_input_ids, device=model.device)
60
- batch_attention_mask = tokenized["attention_mask"][i: i + batch_size]
61
- batch_attention_mask = torch.tensor(batch_attention_mask, device=model.device)
62
- with torch.inference_mode():
63
- predictions = model(batch_input_ids, attention_mask=batch_attention_mask)
64
-
65
- logits = predictions.logits.cpu()
66
-
67
- # If the max probability is less than a threshold, we assume it's a bad prediction
68
- # We want to be conservative to not edit the text too much
69
- probs = F.softmax(logits, dim=-1)
70
- max_prob = torch.max(probs, dim=-1)
71
- cutoff_prob = max_prob.values < settings.EDITOR_CUTOFF_THRESH
72
- labels = logits.argmax(-1)
73
- labels[cutoff_prob] = model.config.label2id["equal"]
74
- labels = labels.squeeze().tolist()
75
- if len(labels) == settings.EDITOR_MAX_LENGTH:
76
- labels = [labels]
77
- labels = list(chain.from_iterable(labels))
78
- token_masks.extend(labels)
79
-
80
- # List of characters in the text
81
- flat_input_ids = list(chain.from_iterable(input_ids))
82
-
83
- # Strip special tokens 0,1. Keep unknown token, although it should never be used
84
- assert len(token_masks) == len(flat_input_ids)
85
- token_masks = [mask for mask, token in zip(token_masks, flat_input_ids) if token >= 2]
86
-
87
- assert len(token_masks) == len(list(text.encode("utf-8")))
88
-
89
- edit_stats = defaultdict(int)
90
- out_text = []
91
- start = 0
92
- for i, char in enumerate(text):
93
- char_token_length = char_token_lengths[i]
94
- masks = token_masks[start: start + char_token_length]
95
- labels = [model.config.id2label[mask] for mask in masks]
96
- if all(l == "delete" for l in labels):
97
- # If we delete whitespace, roll with it, otherwise ignore
98
- if char.strip():
99
- out_text.append(char)
100
- else:
101
- edit_stats["delete"] += 1
102
- elif labels[0] == "newline-1":
103
- out_text.append("\n")
104
- out_text.append(char)
105
- edit_stats["newline-1"] += 1
106
- elif labels[0] == "space-1":
107
- out_text.append(" ")
108
- out_text.append(char)
109
- edit_stats["space-1"] += 1
110
- else:
111
- out_text.append(char)
112
- edit_stats["equal"] += 1
113
-
114
- start += char_token_length
115
-
116
- out_text = "".join(out_text)
117
- return out_text, edit_stats
118
-
119
-
120
-
121
-
122
-
123
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
marker/postprocessors/t5.py DELETED
@@ -1,141 +0,0 @@
1
- from transformers import T5Config, T5PreTrainedModel
2
- import torch
3
- from torch import nn
4
- from copy import deepcopy
5
- from typing import Optional, Tuple, Union
6
- from itertools import chain
7
-
8
- from transformers.modeling_outputs import TokenClassifierOutput
9
- from transformers.models.t5.modeling_t5 import T5Stack
10
- from transformers.utils.model_parallel_utils import get_device_map, assert_device_map
11
-
12
-
13
- def byt5_tokenize(text: str, max_length: int, pad_token_id: int = 0):
14
- byte_codes = []
15
- for char in text:
16
- # Add 3 to account for special tokens
17
- byte_codes.append([byte + 3 for byte in char.encode('utf-8')])
18
-
19
- tokens = list(chain.from_iterable(byte_codes))
20
- # Map each token to the character it represents
21
- char_token_lengths = [len(b) for b in byte_codes]
22
-
23
- batched_tokens = []
24
- attention_mask = []
25
- for i in range(0, len(tokens), max_length):
26
- batched_tokens.append(tokens[i:i + max_length])
27
- attention_mask.append([1] * len(batched_tokens[-1]))
28
-
29
- # Pad last item
30
- if len(batched_tokens[-1]) < max_length:
31
- batched_tokens[-1] += [pad_token_id] * (max_length - len(batched_tokens[-1]))
32
- attention_mask[-1] += [0] * (max_length - len(attention_mask[-1]))
33
-
34
- return {"input_ids": batched_tokens, "attention_mask": attention_mask, "char_token_lengths": char_token_lengths}
35
-
36
-
37
-
38
-
39
- # From https://github.com/osainz59/t5-encoder
40
- class T5ForTokenClassification(T5PreTrainedModel):
41
- _keys_to_ignore_on_load_missing = [r"encoder.embed_tokens.weight"]
42
-
43
- def __init__(self, config: T5Config):
44
- super().__init__(config)
45
- self.model_dim = config.d_model
46
-
47
- self.shared = nn.Embedding(config.vocab_size, config.d_model)
48
-
49
- encoder_config = deepcopy(config)
50
- encoder_config.is_decoder = False
51
- encoder_config.is_encoder_decoder = False
52
- encoder_config.use_cache = False
53
- self.encoder = T5Stack(encoder_config, self.shared)
54
-
55
- classifier_dropout = (
56
- config.classifier_dropout if hasattr(config, 'classifier_dropout') else config.dropout_rate
57
- )
58
- self.dropout = nn.Dropout(classifier_dropout)
59
- self.classifier = nn.Linear(config.d_model, config.num_labels)
60
-
61
- # Initialize weights and apply final processing
62
- self.post_init()
63
-
64
- # Model parallel
65
- self.model_parallel = False
66
- self.device_map = None
67
-
68
-
69
- def parallelize(self, device_map=None):
70
- self.device_map = (
71
- get_device_map(len(self.encoder.block), range(torch.cuda.device_count()))
72
- if device_map is None
73
- else device_map
74
- )
75
- assert_device_map(self.device_map, len(self.encoder.block))
76
- self.encoder.parallelize(self.device_map)
77
- self.classifier.to(self.encoder.first_device)
78
- self.model_parallel = True
79
-
80
- def deparallelize(self):
81
- self.encoder.deparallelize()
82
- self.encoder = self.encoder.to("cpu")
83
- self.classifier = self.classifier.to("cpu")
84
- self.model_parallel = False
85
- self.device_map = None
86
- torch.cuda.empty_cache()
87
-
88
- def get_input_embeddings(self):
89
- return self.shared
90
-
91
- def set_input_embeddings(self, new_embeddings):
92
- self.shared = new_embeddings
93
- self.encoder.set_input_embeddings(new_embeddings)
94
-
95
- def get_encoder(self):
96
- return self.encoder
97
-
98
- def _prune_heads(self, heads_to_prune):
99
- for layer, heads in heads_to_prune.items():
100
- self.encoder.block[layer].layer[0].SelfAttention.prune_heads(heads)
101
-
102
- def forward(
103
- self,
104
- input_ids: Optional[torch.LongTensor] = None,
105
- attention_mask: Optional[torch.FloatTensor] = None,
106
- head_mask: Optional[torch.FloatTensor] = None,
107
- inputs_embeds: Optional[torch.FloatTensor] = None,
108
- labels: Optional[torch.LongTensor] = None,
109
- output_attentions: Optional[bool] = None,
110
- output_hidden_states: Optional[bool] = None,
111
- return_dict: Optional[bool] = None,
112
- ) -> Union[Tuple[torch.FloatTensor], TokenClassifierOutput]:
113
- return_dict = return_dict if return_dict is not None else self.config.use_return_dict
114
-
115
- outputs = self.encoder(
116
- input_ids=input_ids,
117
- attention_mask=attention_mask,
118
- inputs_embeds=inputs_embeds,
119
- head_mask=head_mask,
120
- output_attentions=output_attentions,
121
- output_hidden_states=output_hidden_states,
122
- return_dict=return_dict,
123
- )
124
-
125
- sequence_output = outputs[0]
126
-
127
- sequence_output = self.dropout(sequence_output)
128
- logits = self.classifier(sequence_output)
129
-
130
- loss = None
131
-
132
- if not return_dict:
133
- output = (logits,) + outputs[2:]
134
- return ((loss,) + output) if loss is not None else output
135
-
136
- return TokenClassifierOutput(
137
- loss=loss,
138
- logits=logits,
139
- hidden_states=outputs.hidden_states,
140
- attentions=outputs.attentions
141
- )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
marker/settings.py CHANGED
@@ -47,7 +47,7 @@ class Settings(BaseSettings):
47
  OCR_ALL_PAGES: bool = False # Run OCR on every page even if text can be extracted
48
 
49
  ## Surya
50
- SURYA_OCR_DPI: int = 96
51
  RECOGNITION_BATCH_SIZE: Optional[int] = None # Batch size for surya OCR defaults to 64 for cuda, 32 otherwise
52
 
53
  ## Tesseract
@@ -75,12 +75,8 @@ class Settings(BaseSettings):
75
  ORDER_BATCH_SIZE: Optional[int] = None # Defaults to 12 for cuda, 6 otherwise
76
  ORDER_MAX_BBOXES: int = 255
77
 
78
- # Final editing model
79
- EDITOR_BATCH_SIZE: Optional[int] = None # Defaults to 6 for cuda, 12 otherwise
80
- EDITOR_MAX_LENGTH: int = 1024
81
- EDITOR_MODEL_NAME: str = "vikp/pdf_postprocessor_t5"
82
- ENABLE_EDITOR_MODEL: bool = False # The editor model can create false positives
83
- EDITOR_CUTOFF_THRESH: float = 0.9 # Ignore predictions below this probability
84
 
85
  # Debug
86
  DEBUG: bool = False # Enable debug logging
 
47
  OCR_ALL_PAGES: bool = False # Run OCR on every page even if text can be extracted
48
 
49
  ## Surya
50
+ SURYA_OCR_DPI: int = 192
51
  RECOGNITION_BATCH_SIZE: Optional[int] = None # Batch size for surya OCR defaults to 64 for cuda, 32 otherwise
52
 
53
  ## Tesseract
 
75
  ORDER_BATCH_SIZE: Optional[int] = None # Defaults to 12 for cuda, 6 otherwise
76
  ORDER_MAX_BBOXES: int = 255
77
 
78
+ # Table models
79
+ SURYA_TABLE_DPI: int = 192
 
 
 
 
80
 
81
  # Debug
82
  DEBUG: bool = False # Enable debug logging
marker/tables/cells.py DELETED
@@ -1,112 +0,0 @@
1
- from marker.schema.bbox import rescale_bbox, box_intersection_pct
2
- from marker.schema.page import Page
3
- import numpy as np
4
- from sklearn.cluster import DBSCAN
5
- from marker.settings import settings
6
-
7
-
8
- def cluster_coords(coords, row_count):
9
- if len(coords) == 0:
10
- return []
11
- coords = np.array(sorted(set(coords))).reshape(-1, 1)
12
-
13
- clustering = DBSCAN(eps=.01, min_samples=max(2, row_count // 4)).fit(coords)
14
- clusters = clustering.labels_
15
-
16
- separators = []
17
- for label in set(clusters):
18
- clustered_points = coords[clusters == label]
19
- separators.append(np.mean(clustered_points))
20
-
21
- separators = sorted(separators)
22
- return separators
23
-
24
-
25
- def find_column_separators(page: Page, table_box, rows, round_factor=.002, min_count=1):
26
- left_edges = []
27
- right_edges = []
28
- centers = []
29
-
30
- line_boxes = [p.bbox for p in page.text_lines.bboxes]
31
- line_boxes = [rescale_bbox(page.text_lines.image_bbox, page.bbox, l) for l in line_boxes]
32
- line_boxes = [l for l in line_boxes if box_intersection_pct(l, table_box) > settings.BBOX_INTERSECTION_THRESH]
33
-
34
- pwidth = page.bbox[2] - page.bbox[0]
35
- pheight = page.bbox[3] - page.bbox[1]
36
- for cell in line_boxes:
37
- ncell = [cell[0] / pwidth, cell[1] / pheight, cell[2] / pwidth, cell[3] / pheight]
38
- left_edges.append(ncell[0] / round_factor * round_factor)
39
- right_edges.append(ncell[2] / round_factor * round_factor)
40
- centers.append((ncell[0] + ncell[2]) / 2 * round_factor / round_factor)
41
-
42
- left_edges = [l for l in left_edges if left_edges.count(l) > min_count]
43
- right_edges = [r for r in right_edges if right_edges.count(r) > min_count]
44
- centers = [c for c in centers if centers.count(c) > min_count]
45
-
46
- sorted_left = cluster_coords(left_edges, len(rows))
47
- sorted_right = cluster_coords(right_edges, len(rows))
48
- sorted_center = cluster_coords(centers, len(rows))
49
-
50
- # Find list with minimum length
51
- separators = max([sorted_left, sorted_right, sorted_center], key=len)
52
- separators.append(1)
53
- separators.insert(0, 0)
54
- return separators
55
-
56
-
57
- def assign_cells_to_columns(page, table_box, rows, round_factor=.002, tolerance=.01):
58
- separators = find_column_separators(page, table_box, rows, round_factor=round_factor)
59
- additional_column_index = 0
60
- pwidth = page.bbox[2] - page.bbox[0]
61
- row_dicts = []
62
-
63
- for row in rows:
64
- new_row = {}
65
- last_col_index = -1
66
- for cell in row:
67
- left_edge = cell[0][0] / pwidth
68
- column_index = -1
69
- for i, separator in enumerate(separators):
70
- if left_edge - tolerance < separator and last_col_index < i:
71
- column_index = i
72
- break
73
- if column_index == -1:
74
- column_index = len(separators) + additional_column_index
75
- additional_column_index += 1
76
- new_row[column_index] = cell[1]
77
- last_col_index = column_index
78
- additional_column_index = 0
79
- row_dicts.append(new_row)
80
-
81
- max_row_idx = 0
82
- for row in row_dicts:
83
- max_row_idx = max(max_row_idx, max(row.keys()))
84
-
85
- # Assign sorted cells to columns, account for blanks
86
- new_rows = []
87
- for row in row_dicts:
88
- flat_row = []
89
- for row_idx in range(1, max_row_idx + 1):
90
- if row_idx in row:
91
- flat_row.append(row[row_idx])
92
- else:
93
- flat_row.append("")
94
- new_rows.append(flat_row)
95
-
96
- # Pad rows to have the same length
97
- max_row_len = max([len(r) for r in new_rows])
98
- for row in new_rows:
99
- while len(row) < max_row_len:
100
- row.append("")
101
-
102
- cols_to_remove = set()
103
- for idx, col in enumerate(zip(*new_rows)):
104
- col_total = sum([len(cell.strip()) > 0 for cell in col])
105
- if col_total == 0:
106
- cols_to_remove.add(idx)
107
-
108
- rows = []
109
- for row in new_rows:
110
- rows.append([col for idx, col in enumerate(row) if idx not in cols_to_remove])
111
-
112
- return rows
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
marker/tables/edges.py DELETED
@@ -1,122 +0,0 @@
1
- import math
2
-
3
- import cv2
4
- import numpy as np
5
-
6
-
7
- def get_detected_lines_sobel(image):
8
- sobelx = cv2.Sobel(image, cv2.CV_32F, 1, 0, ksize=3)
9
-
10
- scaled_sobel = np.uint8(255 * sobelx / np.max(sobelx))
11
-
12
- kernel = np.ones((4, 1), np.uint8)
13
- eroded = cv2.erode(scaled_sobel, kernel, iterations=1)
14
- scaled_sobel = cv2.dilate(eroded, kernel, iterations=3)
15
-
16
- return scaled_sobel
17
-
18
-
19
- def get_line_angle(x1, y1, x2, y2):
20
- slope = (y2 - y1) / (x2 - x1)
21
-
22
- angle_radians = math.atan(slope)
23
- angle_degrees = math.degrees(angle_radians)
24
-
25
- return angle_degrees
26
-
27
-
28
- def get_detected_lines(image, slope_tol_deg=10):
29
- new_image = image.astype(np.float32) * 255 # Convert to 0-255 range
30
- new_image = get_detected_lines_sobel(new_image)
31
- new_image = new_image.astype(np.uint8)
32
-
33
- edges = cv2.Canny(new_image, 50, 200, apertureSize=3)
34
-
35
- lines = cv2.HoughLinesP(edges, 1, np.pi / 180, threshold=50, minLineLength=2, maxLineGap=100)
36
-
37
- line_info = []
38
- if lines is not None:
39
- for line in lines:
40
- x1, y1, x2, y2 = line[0]
41
- bbox = [x1, y1, x2, y2]
42
-
43
- vertical = False
44
- if x2 == x1:
45
- vertical = True
46
- else:
47
- line_angle = get_line_angle(x1, y1, x2, y2)
48
- if 90 - slope_tol_deg < line_angle < 90 + slope_tol_deg:
49
- vertical = True
50
- elif -90 - slope_tol_deg < line_angle < -90 + slope_tol_deg:
51
- vertical = True
52
- if not vertical:
53
- continue
54
-
55
- if bbox[3] < bbox[1]:
56
- bbox[1], bbox[3] = bbox[3], bbox[1]
57
- if bbox[2] < bbox[0]:
58
- bbox[0], bbox[2] = bbox[2], bbox[0]
59
- if vertical:
60
- line_info.append(bbox)
61
- return line_info
62
-
63
-
64
- def get_vertical_lines(image, divisor=2, x_tolerance=10, y_tolerance=1):
65
- vertical_lines = get_detected_lines(image)
66
-
67
- vertical_lines = sorted(vertical_lines, key=lambda x: x[0])
68
- for line in vertical_lines:
69
- for i in range(0, len(line)):
70
- line[i] = (line[i] // divisor) * divisor
71
-
72
- # Merge adjacent line segments together
73
- to_remove = []
74
- for i, line in enumerate(vertical_lines):
75
- for j, line2 in enumerate(vertical_lines):
76
- if j <= i:
77
- continue
78
- if line[0] != line2[0]:
79
- continue
80
-
81
- expanded_line1 = [line[0], line[1] - y_tolerance, line[2],
82
- line[3] + y_tolerance]
83
-
84
- line1_points = set(range(int(expanded_line1[1]), int(expanded_line1[3])))
85
- line2_points = set(range(int(line2[1]), int(line2[3])))
86
- intersect_y = len(line1_points.intersection(line2_points)) > 0
87
-
88
- if intersect_y:
89
- vertical_lines[j][1] = min(line[1], line2[1])
90
- vertical_lines[j][3] = max(line[3], line2[3])
91
- to_remove.append(i)
92
-
93
- vertical_lines = [line for i, line in enumerate(vertical_lines) if i not in to_remove]
94
-
95
- # Remove redundant segments
96
- to_remove = []
97
- for i, line in enumerate(vertical_lines):
98
- if i in to_remove:
99
- continue
100
- for j, line2 in enumerate(vertical_lines):
101
- if j <= i or j in to_remove:
102
- continue
103
- close_in_x = abs(line[0] - line2[0]) < x_tolerance
104
- line1_points = set(range(int(line[1]), int(line[3])))
105
- line2_points = set(range(int(line2[1]), int(line2[3])))
106
-
107
- intersect_y = len(line1_points.intersection(line2_points)) > 0
108
-
109
- if close_in_x and intersect_y:
110
- # Keep the longer line and extend it
111
- if len(line2_points) > len(line1_points):
112
- vertical_lines[j][1] = min(line[1], line2[1])
113
- vertical_lines[j][3] = max(line[3], line2[3])
114
- to_remove.append(i)
115
- else:
116
- vertical_lines[i][1] = min(line[1], line2[1])
117
- vertical_lines[i][3] = max(line[3], line2[3])
118
- to_remove.append(j)
119
-
120
- vertical_lines = [line for i, line in enumerate(vertical_lines) if i not in to_remove]
121
-
122
- return vertical_lines
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
marker/tables/table.py CHANGED
@@ -1,155 +1,107 @@
1
- from marker.schema.bbox import merge_boxes, box_intersection_pct, rescale_bbox
 
 
 
 
 
 
 
 
 
 
2
  from marker.schema.block import Line, Span, Block
3
  from marker.schema.page import Page
4
- from tabulate import tabulate
5
  from typing import List
6
 
7
  from marker.settings import settings
8
- from marker.tables.cells import assign_cells_to_columns
9
- from marker.tables.utils import sort_table_blocks, replace_dots, replace_newlines
10
-
11
-
12
- def get_table_surya(page, table_box, space_tol=.01) -> List[List[str]]:
13
- table_rows = []
14
- table_row = []
15
- x_position = None
16
- sorted_blocks = sort_table_blocks(page.blocks)
17
- for block_idx, block in enumerate(sorted_blocks):
18
- sorted_lines = sort_table_blocks(block.lines)
19
- for line_idx, line in enumerate(sorted_lines):
20
- line_bbox = line.bbox
21
- intersect_pct = box_intersection_pct(line_bbox, table_box)
22
- if intersect_pct < settings.TABLE_INTERSECTION_THRESH or len(line.spans) == 0:
23
- continue
24
- normed_x_start = line_bbox[0] / page.width
25
- normed_x_end = line_bbox[2] / page.width
26
-
27
- cells = [[s.bbox, s.text] for s in line.spans]
28
- if x_position is None or normed_x_start > x_position - space_tol:
29
- # Same row
30
- table_row.extend(cells)
31
- else:
32
- # New row
33
- if len(table_row) > 0:
34
- table_rows.append(table_row)
35
- table_row = cells
36
- x_position = normed_x_end
37
- if len(table_row) > 0:
38
- table_rows.append(table_row)
39
- table_rows = assign_cells_to_columns(page, table_box, table_rows)
40
- return table_rows
41
-
42
-
43
- def get_table_pdftext(page: Page, table_box, space_tol=.01, round_factor=4) -> List[List[str]]:
44
- page_width = page.width
45
- table_rows = []
46
- table_cell = ""
47
- cell_bbox = None
48
- table_row = []
49
- sorted_char_blocks = sort_table_blocks(page.char_blocks)
50
-
51
- table_width = table_box[2] - table_box[0]
52
- new_line_start_x = table_box[0] + table_width * .3
53
- table_width_pct = (table_width / page_width) * .95
54
-
55
- for block_idx, block in enumerate(sorted_char_blocks):
56
- sorted_lines = sort_table_blocks(block["lines"])
57
- for line_idx, line in enumerate(sorted_lines):
58
- line_bbox = line["bbox"]
59
- intersect_pct = box_intersection_pct(line_bbox, table_box)
60
- if intersect_pct < settings.TABLE_INTERSECTION_THRESH:
61
- continue
62
- for span in line["spans"]:
63
- for char in span["chars"]:
64
- x_start, y_start, x_end, y_end = char["bbox"]
65
- x_start /= page_width
66
- x_end /= page_width
67
- fullwidth_cell = False
68
-
69
- if cell_bbox is not None:
70
- # Find boundaries of cell bbox before merging
71
- cell_x_start, cell_y_start, cell_x_end, cell_y_end = cell_bbox
72
- cell_x_start /= page_width
73
- cell_x_end /= page_width
74
-
75
- fullwidth_cell = cell_x_end - cell_x_start >= table_width_pct
76
-
77
- cell_content = replace_dots(replace_newlines(table_cell))
78
- if cell_bbox is None: # First char
79
- table_cell += char["char"]
80
- cell_bbox = char["bbox"]
81
- # Check if we are in the same cell, ensure cell is not full table width (like if stray text gets included in the table)
82
- elif (cell_x_start - space_tol < x_start < cell_x_end + space_tol) and not fullwidth_cell:
83
- table_cell += char["char"]
84
- cell_bbox = merge_boxes(cell_bbox, char["bbox"])
85
- # New line and cell
86
- # Use x_start < new_line_start_x to account for out-of-order cells in the pdf
87
- elif x_start < cell_x_end - space_tol and x_start < new_line_start_x:
88
- if len(table_cell) > 0:
89
- table_row.append((cell_bbox, cell_content))
90
- table_cell = char["char"]
91
- cell_bbox = char["bbox"]
92
- if len(table_row) > 0:
93
- table_row = sorted(table_row, key=lambda x: round(x[0][0] / round_factor))
94
- table_rows.append(table_row)
95
- table_row = []
96
- else: # Same line, new cell, check against cell bbox
97
- if len(table_cell) > 0:
98
- table_row.append((cell_bbox, cell_content))
99
- table_cell = char["char"]
100
- cell_bbox = char["bbox"]
101
-
102
- if len(table_cell) > 0:
103
- table_row.append((cell_bbox, replace_dots(replace_newlines(table_cell))))
104
- if len(table_row) > 0:
105
- table_row = sorted(table_row, key=lambda x: round(x[0][0] / round_factor))
106
- table_rows.append(table_row)
107
-
108
- total_cells = sum([len(row) for row in table_rows])
109
- if total_cells > 0:
110
- table_rows = assign_cells_to_columns(page, table_box, table_rows)
111
- return table_rows
112
- else:
113
- return []
114
-
115
-
116
- def merge_tables(page_table_boxes):
117
- # Merge tables that are next to each other
118
- expansion_factor = 1.02
119
- shrink_factor = .98
120
- ignore_boxes = set()
121
- for i in range(len(page_table_boxes)):
122
- if i in ignore_boxes:
123
  continue
124
- for j in range(i + 1, len(page_table_boxes)):
125
- if j in ignore_boxes:
126
- continue
127
- expanded_box1 = [page_table_boxes[i][0] * shrink_factor, page_table_boxes[i][1],
128
- page_table_boxes[i][2] * expansion_factor, page_table_boxes[i][3]]
129
- expanded_box2 = [page_table_boxes[j][0] * shrink_factor, page_table_boxes[j][1],
130
- page_table_boxes[j][2] * expansion_factor, page_table_boxes[j][3]]
131
- if box_intersection_pct(expanded_box1, expanded_box2) > 0:
132
- page_table_boxes[i] = merge_boxes(page_table_boxes[i], page_table_boxes[j])
133
- ignore_boxes.add(j)
134
 
135
- return [b for i, b in enumerate(page_table_boxes) if i not in ignore_boxes]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
136
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
137
 
138
- def format_tables(pages: List[Page]):
139
- # Formats tables nicely into github flavored markdown
140
  table_count = 0
141
- for page in pages:
 
 
 
 
142
  table_insert_points = {}
143
  blocks_to_remove = set()
144
  pnum = page.pnum
145
-
146
- page_table_boxes = [b for b in page.layout.bboxes if b.label == "Table"]
147
- page_table_boxes = [rescale_bbox(page.layout.image_bbox, page.bbox, b.bbox) for b in page_table_boxes]
148
- page_table_boxes = merge_tables(page_table_boxes)
149
 
150
  for table_idx, table_box in enumerate(page_table_boxes):
 
 
151
  for block_idx, block in enumerate(page.blocks):
152
- intersect_pct = block.intersection_pct(table_box)
153
  if intersect_pct > settings.TABLE_INTERSECTION_THRESH and block.block_type == "Table":
154
  if table_idx not in table_insert_points:
155
  table_insert_points[table_idx] = max(0, block_idx - len(blocks_to_remove)) # Where to insert the new table
@@ -163,17 +115,10 @@ def format_tables(pages: List[Page]):
163
 
164
  for table_idx, table_box in enumerate(page_table_boxes):
165
  if table_idx not in table_insert_points:
 
166
  continue
167
 
168
- if page.ocr_method == "surya":
169
- table_rows = get_table_surya(page, table_box)
170
- else:
171
- table_rows = get_table_pdftext(page, table_box)
172
- # Skip empty tables
173
- if len(table_rows) == 0:
174
- continue
175
-
176
- table_text = tabulate(table_rows, headers="firstrow", tablefmt="github", disable_numparse=True)
177
  table_block = Block(
178
  bbox=table_box,
179
  block_type="Table",
@@ -187,7 +132,7 @@ def format_tables(pages: List[Page]):
187
  font_size=0,
188
  font_weight=0,
189
  block_type="Table",
190
- text=table_text
191
  )]
192
  )]
193
  )
 
1
+ from tqdm import tqdm
2
+ from pypdfium2 import PdfDocument
3
+ from tabled.assignment import assign_rows_columns
4
+ from tabled.formats import formatter
5
+ from tabled.inference.detection import merge_tables
6
+
7
+ from surya.input.pdflines import get_page_text_lines
8
+ from tabled.inference.recognition import get_cells, recognize_tables
9
+
10
+ from marker.pdf.images import render_image
11
+ from marker.schema.bbox import rescale_bbox
12
  from marker.schema.block import Line, Span, Block
13
  from marker.schema.page import Page
 
14
  from typing import List
15
 
16
  from marker.settings import settings
17
+
18
+
19
+ def get_table_boxes(pages: List[Page], doc: PdfDocument, fname):
20
+ table_imgs = []
21
+ table_counts = []
22
+ table_bboxes = []
23
+ img_sizes = []
24
+
25
+ for page in pages:
26
+ pnum = page.pnum
27
+ # The bbox for the entire table
28
+ bbox = [b.bbox for b in page.layout.bboxes if b.label == "Table"]
29
+
30
+ if len(bbox) == 0:
31
+ table_counts.append(0)
32
+ img_sizes.append(None)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  continue
 
 
 
 
 
 
 
 
 
 
34
 
35
+ highres_img = render_image(doc[pnum], dpi=settings.SURYA_TABLE_DPI)
36
+
37
+ page_table_imgs = []
38
+ lowres_bbox = []
39
+
40
+ # Merge tables that are next to each other
41
+ bbox = merge_tables(bbox)
42
+
43
+ # Number of tables per page
44
+ table_counts.append(len(bbox))
45
+ img_sizes.append(highres_img.size)
46
+
47
+ for bb in bbox:
48
+ highres_bb = rescale_bbox(page.layout.image_bbox, [0, 0, highres_img.size[0], highres_img.size[1]], bb)
49
+ page_table_imgs.append(highres_img.crop(highres_bb))
50
+ lowres_bbox.append(highres_bb)
51
 
52
+ table_imgs.extend(page_table_imgs)
53
+ table_bboxes.extend(lowres_bbox)
54
+
55
+ table_idxs = [i for i, c in enumerate(table_counts) if c > 0]
56
+ sel_text_lines = get_page_text_lines(
57
+ fname,
58
+ table_idxs,
59
+ [hr for i, hr in enumerate(img_sizes) if i in table_idxs],
60
+ )
61
+ text_lines = []
62
+ out_img_sizes = []
63
+ for i in range(len(table_counts)):
64
+ if i in table_idxs:
65
+ text_lines.extend([sel_text_lines.pop(0)] * table_counts[i])
66
+ out_img_sizes.extend([img_sizes[i]] * table_counts[i])
67
+
68
+ assert len(table_imgs) == len(table_bboxes) == len(text_lines) == len(out_img_sizes)
69
+ assert sum(table_counts) == len(table_imgs)
70
+
71
+ return table_imgs, table_bboxes, table_counts, text_lines, out_img_sizes
72
+
73
+
74
+ def format_tables(pages: List[Page], doc: PdfDocument, fname: str, detection_model, table_rec_model, ocr_model):
75
+ det_models = [detection_model, detection_model.processor]
76
+ rec_models = [table_rec_model, table_rec_model.processor, ocr_model, ocr_model.processor]
77
+
78
+ # Don't look at table cell detection tqdm output
79
+ tqdm.disable = True
80
+ table_imgs, table_boxes, table_counts, table_text_lines, img_sizes = get_table_boxes(pages, doc, fname)
81
+ cells, needs_ocr = get_cells(table_imgs, table_boxes, img_sizes, table_text_lines, det_models, detect_boxes=settings.OCR_ALL_PAGES)
82
+ tqdm.disable = False
83
+
84
+ table_rec = recognize_tables(table_imgs, cells, needs_ocr, rec_models)
85
+ cells = [assign_rows_columns(tr, im_size) for tr, im_size in zip(table_rec, img_sizes)]
86
+ table_md = [formatter("markdown", cell)[0] for cell in cells]
87
 
 
 
88
  table_count = 0
89
+ for page_idx, page in enumerate(pages):
90
+ page_table_count = table_counts[page_idx]
91
+ if page_table_count == 0:
92
+ continue
93
+
94
  table_insert_points = {}
95
  blocks_to_remove = set()
96
  pnum = page.pnum
97
+ highres_size = img_sizes[table_count]
98
+ page_table_boxes = table_boxes[table_count:table_count + page_table_count]
 
 
99
 
100
  for table_idx, table_box in enumerate(page_table_boxes):
101
+ lowres_table_box = rescale_bbox([0, 0, highres_size[0], highres_size[1]], page.bbox, table_box)
102
+
103
  for block_idx, block in enumerate(page.blocks):
104
+ intersect_pct = block.intersection_pct(lowres_table_box)
105
  if intersect_pct > settings.TABLE_INTERSECTION_THRESH and block.block_type == "Table":
106
  if table_idx not in table_insert_points:
107
  table_insert_points[table_idx] = max(0, block_idx - len(blocks_to_remove)) # Where to insert the new table
 
115
 
116
  for table_idx, table_box in enumerate(page_table_boxes):
117
  if table_idx not in table_insert_points:
118
+ table_count += 1
119
  continue
120
 
121
+ markdown = table_md[table_count]
 
 
 
 
 
 
 
 
122
  table_block = Block(
123
  bbox=table_box,
124
  block_type="Table",
 
132
  font_size=0,
133
  font_weight=0,
134
  block_type="Table",
135
+ text=markdown
136
  )]
137
  )]
138
  )
poetry.lock CHANGED
@@ -601,6 +601,23 @@ files = [
601
  {file = "colorama-0.4.6.tar.gz", hash = "sha256:08695f5cb7ed6e0531a20572697297273c47b8cae5a63ffc6d6ed5c201be6e44"},
602
  ]
603
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
604
  [[package]]
605
  name = "comm"
606
  version = "0.2.2"
@@ -799,6 +816,17 @@ files = [
799
  {file = "filetype-1.2.0.tar.gz", hash = "sha256:66b56cd6474bf41d8c54660347d37afcc3f7d1970648de365c102ef77548aadb"},
800
  ]
801
 
 
 
 
 
 
 
 
 
 
 
 
802
  [[package]]
803
  name = "fqdn"
804
  version = "1.5.1"
@@ -1132,6 +1160,20 @@ testing = ["InquirerPy (==0.3.4)", "Jinja2", "Pillow", "aiohttp", "fastapi", "gr
1132
  torch = ["safetensors", "torch"]
1133
  typing = ["types-PyYAML", "types-requests", "types-simplejson", "types-toml", "types-tqdm", "types-urllib3", "typing-extensions (>=4.8.0)"]
1134
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1135
  [[package]]
1136
  name = "idna"
1137
  version = "3.7"
@@ -1143,25 +1185,6 @@ files = [
1143
  {file = "idna-3.7.tar.gz", hash = "sha256:028ff3aadf0609c1fd278d8ea3089299412a7a8b9bd005dd08b9f8285bcb5cfc"},
1144
  ]
1145
 
1146
- [[package]]
1147
- name = "importlib-metadata"
1148
- version = "8.0.0"
1149
- description = "Read metadata from Python packages"
1150
- optional = false
1151
- python-versions = ">=3.8"
1152
- files = [
1153
- {file = "importlib_metadata-8.0.0-py3-none-any.whl", hash = "sha256:15584cf2b1bf449d98ff8a6ff1abef57bf20f3ac6454f431736cd3e660921b2f"},
1154
- {file = "importlib_metadata-8.0.0.tar.gz", hash = "sha256:188bd24e4c346d3f0a933f275c2fec67050326a856b9a359881d7c2a697e8812"},
1155
- ]
1156
-
1157
- [package.dependencies]
1158
- zipp = ">=0.5"
1159
-
1160
- [package.extras]
1161
- doc = ["furo", "jaraco.packaging (>=9.3)", "jaraco.tidelift (>=1.4)", "rst.linker (>=1.9)", "sphinx (>=3.5)", "sphinx-lint"]
1162
- perf = ["ipython"]
1163
- test = ["flufl.flake8", "importlib-resources (>=1.3)", "jaraco.test (>=5.4)", "packaging", "pyfakefs", "pytest (>=6,!=8.1.*)", "pytest-checkdocs (>=2.4)", "pytest-cov", "pytest-enabler (>=2.2)", "pytest-mypy", "pytest-perf (>=0.9.2)", "pytest-ruff (>=0.2.1)"]
1164
-
1165
  [[package]]
1166
  name = "intel-openmp"
1167
  version = "2021.4.0"
@@ -1231,7 +1254,6 @@ prompt-toolkit = ">=3.0.41,<3.1.0"
1231
  pygments = ">=2.4.0"
1232
  stack-data = "*"
1233
  traitlets = ">=5"
1234
- typing-extensions = {version = "*", markers = "python_version < \"3.10\""}
1235
 
1236
  [package.extras]
1237
  all = ["black", "curio", "docrepr", "exceptiongroup", "ipykernel", "ipyparallel", "ipywidgets", "matplotlib", "matplotlib (!=3.2.0)", "nbconvert", "nbformat", "notebook", "numpy (>=1.22)", "pandas", "pickleshare", "pytest (<7)", "pytest (<7.1)", "pytest-asyncio (<0.22)", "qtconsole", "setuptools (>=18.5)", "sphinx (>=1.3)", "sphinx-rtd-theme", "stack-data", "testpath", "trio", "typing-extensions"]
@@ -1425,7 +1447,6 @@ files = [
1425
  ]
1426
 
1427
  [package.dependencies]
1428
- importlib-metadata = {version = ">=4.8.3", markers = "python_version < \"3.10\""}
1429
  jupyter-core = ">=4.12,<5.0.dev0 || >=5.1.dev0"
1430
  python-dateutil = ">=2.8.2"
1431
  pyzmq = ">=23.0"
@@ -1517,7 +1538,6 @@ files = [
1517
  ]
1518
 
1519
  [package.dependencies]
1520
- importlib-metadata = {version = ">=4.8.3", markers = "python_version < \"3.10\""}
1521
  jupyter-server = ">=1.1.2"
1522
 
1523
  [[package]]
@@ -1589,7 +1609,6 @@ files = [
1589
  [package.dependencies]
1590
  async-lru = ">=1.0.0"
1591
  httpx = ">=0.25.0"
1592
- importlib-metadata = {version = ">=4.8.3", markers = "python_version < \"3.10\""}
1593
  ipykernel = ">=6.5.0"
1594
  jinja2 = ">=3.0.3"
1595
  jupyter-core = "*"
@@ -1634,7 +1653,6 @@ files = [
1634
 
1635
  [package.dependencies]
1636
  babel = ">=2.10"
1637
- importlib-metadata = {version = ">=4.8.3", markers = "python_version < \"3.10\""}
1638
  jinja2 = ">=3.0.3"
1639
  json5 = ">=0.9.0"
1640
  jsonschema = ">=4.18.0"
@@ -1999,7 +2017,6 @@ files = [
1999
  beautifulsoup4 = "*"
2000
  bleach = "!=5.0.0"
2001
  defusedxml = "*"
2002
- importlib-metadata = {version = ">=3.6", markers = "python_version < \"3.10\""}
2003
  jinja2 = ">=3.0"
2004
  jupyter-core = ">=4.7"
2005
  jupyterlab-pygments = "*"
@@ -2284,6 +2301,7 @@ description = "Nvidia JIT LTO Library"
2284
  optional = false
2285
  python-versions = ">=3"
2286
  files = [
 
2287
  {file = "nvidia_nvjitlink_cu12-12.5.82-py3-none-manylinux2014_x86_64.whl", hash = "sha256:f9b37bc5c8cf7509665cb6ada5aaa0ce65618f2332b7d3e78e9790511f111212"},
2288
  {file = "nvidia_nvjitlink_cu12-12.5.82-py3-none-win_amd64.whl", hash = "sha256:e782564d705ff0bf61ac3e1bf730166da66dd2fe9012f111ede5fc49b64ae697"},
2289
  ]
@@ -2299,6 +2317,48 @@ files = [
2299
  {file = "nvidia_nvtx_cu12-12.1.105-py3-none-win_amd64.whl", hash = "sha256:65f4d98982b31b60026e0e6de73fbdfc09d08a96f4656dd3665ca616a11e1e82"},
2300
  ]
2301
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2302
  [[package]]
2303
  name = "opencv-python"
2304
  version = "4.10.0.84"
@@ -2317,12 +2377,10 @@ files = [
2317
 
2318
  [package.dependencies]
2319
  numpy = [
2320
- {version = ">=1.21.0", markers = "python_version == \"3.9\" and platform_system == \"Darwin\" and platform_machine == \"arm64\""},
2321
  {version = ">=1.21.4", markers = "python_version >= \"3.10\" and platform_system == \"Darwin\" and python_version < \"3.11\""},
2322
  {version = ">=1.21.2", markers = "platform_system != \"Darwin\" and python_version >= \"3.10\" and python_version < \"3.11\""},
2323
- {version = ">=1.19.3", markers = "platform_system == \"Linux\" and platform_machine == \"aarch64\" and python_version >= \"3.8\" and python_version < \"3.10\" or python_version > \"3.9\" and python_version < \"3.10\" or python_version >= \"3.9\" and platform_system != \"Darwin\" and python_version < \"3.10\" or python_version >= \"3.9\" and platform_machine != \"arm64\" and python_version < \"3.10\""},
2324
  {version = ">=1.23.5", markers = "python_version >= \"3.11\" and python_version < \"3.12\""},
2325
- {version = ">=1.26.0", markers = "python_version >= \"3.12\""},
2326
  ]
2327
 
2328
  [[package]]
@@ -2387,9 +2445,9 @@ files = [
2387
 
2388
  [package.dependencies]
2389
  numpy = [
 
2390
  {version = ">=1.22.4", markers = "python_version < \"3.11\""},
2391
  {version = ">=1.23.2", markers = "python_version == \"3.11\""},
2392
- {version = ">=1.26.0", markers = "python_version >= \"3.12\""},
2393
  ]
2394
  python-dateutil = ">=2.8.2"
2395
  pytz = ">=2020.1"
@@ -2448,20 +2506,20 @@ testing = ["docopt", "pytest"]
2448
 
2449
  [[package]]
2450
  name = "pdftext"
2451
- version = "0.3.10"
2452
  description = "Extract structured text from pdfs quickly"
2453
  optional = false
2454
  python-versions = "!=2.7.*,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*,!=3.5.*,!=3.6.*,!=3.7.*,!=3.8.*,>=3.9"
2455
  files = [
2456
- {file = "pdftext-0.3.10-py3-none-any.whl", hash = "sha256:99bd900d0d0692df06719c07ce10a859750ade3eb7f10c543f637118417497f9"},
2457
- {file = "pdftext-0.3.10.tar.gz", hash = "sha256:90de726e818fb5683a0616cabb1a75a32a7224e873c3058006c93da6e440c66c"},
2458
  ]
2459
 
2460
  [package.dependencies]
 
2461
  pydantic = ">=2.7.1,<3.0.0"
2462
  pydantic-settings = ">=2.2.1,<3.0.0"
2463
  pypdfium2 = ">=4.29.0,<5.0.0"
2464
- scikit-learn = ">=1.4.2,<2.0.0"
2465
 
2466
  [[package]]
2467
  name = "pexpect"
@@ -2756,119 +2814,123 @@ files = [
2756
 
2757
  [[package]]
2758
  name = "pydantic"
2759
- version = "2.8.2"
2760
  description = "Data validation using Python type hints"
2761
  optional = false
2762
  python-versions = ">=3.8"
2763
  files = [
2764
- {file = "pydantic-2.8.2-py3-none-any.whl", hash = "sha256:73ee9fddd406dc318b885c7a2eab8a6472b68b8fb5ba8150949fc3db939f23c8"},
2765
- {file = "pydantic-2.8.2.tar.gz", hash = "sha256:6f62c13d067b0755ad1c21a34bdd06c0c12625a22b0fc09c6b149816604f7c2a"},
2766
  ]
2767
 
2768
  [package.dependencies]
2769
- annotated-types = ">=0.4.0"
2770
- pydantic-core = "2.20.1"
2771
- typing-extensions = {version = ">=4.6.1", markers = "python_version < \"3.13\""}
 
 
 
2772
 
2773
  [package.extras]
2774
  email = ["email-validator (>=2.0.0)"]
 
2775
 
2776
  [[package]]
2777
  name = "pydantic-core"
2778
- version = "2.20.1"
2779
  description = "Core functionality for Pydantic validation and serialization"
2780
  optional = false
2781
  python-versions = ">=3.8"
2782
  files = [
2783
- {file = "pydantic_core-2.20.1-cp310-cp310-macosx_10_12_x86_64.whl", hash = "sha256:3acae97ffd19bf091c72df4d726d552c473f3576409b2a7ca36b2f535ffff4a3"},
2784
- {file = "pydantic_core-2.20.1-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:41f4c96227a67a013e7de5ff8f20fb496ce573893b7f4f2707d065907bffdbd6"},
2785
- {file = "pydantic_core-2.20.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:5f239eb799a2081495ea659d8d4a43a8f42cd1fe9ff2e7e436295c38a10c286a"},
2786
- {file = "pydantic_core-2.20.1-cp310-cp310-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:53e431da3fc53360db73eedf6f7124d1076e1b4ee4276b36fb25514544ceb4a3"},
2787
- {file = "pydantic_core-2.20.1-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:f1f62b2413c3a0e846c3b838b2ecd6c7a19ec6793b2a522745b0869e37ab5bc1"},
2788
- {file = "pydantic_core-2.20.1-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:5d41e6daee2813ecceea8eda38062d69e280b39df793f5a942fa515b8ed67953"},
2789
- {file = "pydantic_core-2.20.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:3d482efec8b7dc6bfaedc0f166b2ce349df0011f5d2f1f25537ced4cfc34fd98"},
2790
- {file = "pydantic_core-2.20.1-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:e93e1a4b4b33daed65d781a57a522ff153dcf748dee70b40c7258c5861e1768a"},
2791
- {file = "pydantic_core-2.20.1-cp310-cp310-musllinux_1_1_aarch64.whl", hash = "sha256:e7c4ea22b6739b162c9ecaaa41d718dfad48a244909fe7ef4b54c0b530effc5a"},
2792
- {file = "pydantic_core-2.20.1-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:4f2790949cf385d985a31984907fecb3896999329103df4e4983a4a41e13e840"},
2793
- {file = "pydantic_core-2.20.1-cp310-none-win32.whl", hash = "sha256:5e999ba8dd90e93d57410c5e67ebb67ffcaadcea0ad973240fdfd3a135506250"},
2794
- {file = "pydantic_core-2.20.1-cp310-none-win_amd64.whl", hash = "sha256:512ecfbefef6dac7bc5eaaf46177b2de58cdf7acac8793fe033b24ece0b9566c"},
2795
- {file = "pydantic_core-2.20.1-cp311-cp311-macosx_10_12_x86_64.whl", hash = "sha256:d2a8fa9d6d6f891f3deec72f5cc668e6f66b188ab14bb1ab52422fe8e644f312"},
2796
- {file = "pydantic_core-2.20.1-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:175873691124f3d0da55aeea1d90660a6ea7a3cfea137c38afa0a5ffabe37b88"},
2797
- {file = "pydantic_core-2.20.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:37eee5b638f0e0dcd18d21f59b679686bbd18917b87db0193ae36f9c23c355fc"},
2798
- {file = "pydantic_core-2.20.1-cp311-cp311-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:25e9185e2d06c16ee438ed39bf62935ec436474a6ac4f9358524220f1b236e43"},
2799
- {file = "pydantic_core-2.20.1-cp311-cp311-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:150906b40ff188a3260cbee25380e7494ee85048584998c1e66df0c7a11c17a6"},
2800
- {file = "pydantic_core-2.20.1-cp311-cp311-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:8ad4aeb3e9a97286573c03df758fc7627aecdd02f1da04516a86dc159bf70121"},
2801
- {file = "pydantic_core-2.20.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:d3f3ed29cd9f978c604708511a1f9c2fdcb6c38b9aae36a51905b8811ee5cbf1"},
2802
- {file = "pydantic_core-2.20.1-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:b0dae11d8f5ded51699c74d9548dcc5938e0804cc8298ec0aa0da95c21fff57b"},
2803
- {file = "pydantic_core-2.20.1-cp311-cp311-musllinux_1_1_aarch64.whl", hash = "sha256:faa6b09ee09433b87992fb5a2859efd1c264ddc37280d2dd5db502126d0e7f27"},
2804
- {file = "pydantic_core-2.20.1-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:9dc1b507c12eb0481d071f3c1808f0529ad41dc415d0ca11f7ebfc666e66a18b"},
2805
- {file = "pydantic_core-2.20.1-cp311-none-win32.whl", hash = "sha256:fa2fddcb7107e0d1808086ca306dcade7df60a13a6c347a7acf1ec139aa6789a"},
2806
- {file = "pydantic_core-2.20.1-cp311-none-win_amd64.whl", hash = "sha256:40a783fb7ee353c50bd3853e626f15677ea527ae556429453685ae32280c19c2"},
2807
- {file = "pydantic_core-2.20.1-cp312-cp312-macosx_10_12_x86_64.whl", hash = "sha256:595ba5be69b35777474fa07f80fc260ea71255656191adb22a8c53aba4479231"},
2808
- {file = "pydantic_core-2.20.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:a4f55095ad087474999ee28d3398bae183a66be4823f753cd7d67dd0153427c9"},
2809
- {file = "pydantic_core-2.20.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:f9aa05d09ecf4c75157197f27cdc9cfaeb7c5f15021c6373932bf3e124af029f"},
2810
- {file = "pydantic_core-2.20.1-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:e97fdf088d4b31ff4ba35db26d9cc472ac7ef4a2ff2badeabf8d727b3377fc52"},
2811
- {file = "pydantic_core-2.20.1-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:bc633a9fe1eb87e250b5c57d389cf28998e4292336926b0b6cdaee353f89a237"},
2812
- {file = "pydantic_core-2.20.1-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:d573faf8eb7e6b1cbbcb4f5b247c60ca8be39fe2c674495df0eb4318303137fe"},
2813
- {file = "pydantic_core-2.20.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:26dc97754b57d2fd00ac2b24dfa341abffc380b823211994c4efac7f13b9e90e"},
2814
- {file = "pydantic_core-2.20.1-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:33499e85e739a4b60c9dac710c20a08dc73cb3240c9a0e22325e671b27b70d24"},
2815
- {file = "pydantic_core-2.20.1-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:bebb4d6715c814597f85297c332297c6ce81e29436125ca59d1159b07f423eb1"},
2816
- {file = "pydantic_core-2.20.1-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:516d9227919612425c8ef1c9b869bbbee249bc91912c8aaffb66116c0b447ebd"},
2817
- {file = "pydantic_core-2.20.1-cp312-none-win32.whl", hash = "sha256:469f29f9093c9d834432034d33f5fe45699e664f12a13bf38c04967ce233d688"},
2818
- {file = "pydantic_core-2.20.1-cp312-none-win_amd64.whl", hash = "sha256:035ede2e16da7281041f0e626459bcae33ed998cca6a0a007a5ebb73414ac72d"},
2819
- {file = "pydantic_core-2.20.1-cp313-cp313-macosx_10_12_x86_64.whl", hash = "sha256:0827505a5c87e8aa285dc31e9ec7f4a17c81a813d45f70b1d9164e03a813a686"},
2820
- {file = "pydantic_core-2.20.1-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:19c0fa39fa154e7e0b7f82f88ef85faa2a4c23cc65aae2f5aea625e3c13c735a"},
2821
- {file = "pydantic_core-2.20.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:4aa223cd1e36b642092c326d694d8bf59b71ddddc94cdb752bbbb1c5c91d833b"},
2822
- {file = "pydantic_core-2.20.1-cp313-cp313-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:c336a6d235522a62fef872c6295a42ecb0c4e1d0f1a3e500fe949415761b8a19"},
2823
- {file = "pydantic_core-2.20.1-cp313-cp313-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:7eb6a0587eded33aeefea9f916899d42b1799b7b14b8f8ff2753c0ac1741edac"},
2824
- {file = "pydantic_core-2.20.1-cp313-cp313-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:70c8daf4faca8da5a6d655f9af86faf6ec2e1768f4b8b9d0226c02f3d6209703"},
2825
- {file = "pydantic_core-2.20.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:e9fa4c9bf273ca41f940bceb86922a7667cd5bf90e95dbb157cbb8441008482c"},
2826
- {file = "pydantic_core-2.20.1-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:11b71d67b4725e7e2a9f6e9c0ac1239bbc0c48cce3dc59f98635efc57d6dac83"},
2827
- {file = "pydantic_core-2.20.1-cp313-cp313-musllinux_1_1_aarch64.whl", hash = "sha256:270755f15174fb983890c49881e93f8f1b80f0b5e3a3cc1394a255706cabd203"},
2828
- {file = "pydantic_core-2.20.1-cp313-cp313-musllinux_1_1_x86_64.whl", hash = "sha256:c81131869240e3e568916ef4c307f8b99583efaa60a8112ef27a366eefba8ef0"},
2829
- {file = "pydantic_core-2.20.1-cp313-none-win32.whl", hash = "sha256:b91ced227c41aa29c672814f50dbb05ec93536abf8f43cd14ec9521ea09afe4e"},
2830
- {file = "pydantic_core-2.20.1-cp313-none-win_amd64.whl", hash = "sha256:65db0f2eefcaad1a3950f498aabb4875c8890438bc80b19362cf633b87a8ab20"},
2831
- {file = "pydantic_core-2.20.1-cp38-cp38-macosx_10_12_x86_64.whl", hash = "sha256:4745f4ac52cc6686390c40eaa01d48b18997cb130833154801a442323cc78f91"},
2832
- {file = "pydantic_core-2.20.1-cp38-cp38-macosx_11_0_arm64.whl", hash = "sha256:a8ad4c766d3f33ba8fd692f9aa297c9058970530a32c728a2c4bfd2616d3358b"},
2833
- {file = "pydantic_core-2.20.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:41e81317dd6a0127cabce83c0c9c3fbecceae981c8391e6f1dec88a77c8a569a"},
2834
- {file = "pydantic_core-2.20.1-cp38-cp38-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:04024d270cf63f586ad41fff13fde4311c4fc13ea74676962c876d9577bcc78f"},
2835
- {file = "pydantic_core-2.20.1-cp38-cp38-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:eaad4ff2de1c3823fddf82f41121bdf453d922e9a238642b1dedb33c4e4f98ad"},
2836
- {file = "pydantic_core-2.20.1-cp38-cp38-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:26ab812fa0c845df815e506be30337e2df27e88399b985d0bb4e3ecfe72df31c"},
2837
- {file = "pydantic_core-2.20.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:3c5ebac750d9d5f2706654c638c041635c385596caf68f81342011ddfa1e5598"},
2838
- {file = "pydantic_core-2.20.1-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:2aafc5a503855ea5885559eae883978c9b6d8c8993d67766ee73d82e841300dd"},
2839
- {file = "pydantic_core-2.20.1-cp38-cp38-musllinux_1_1_aarch64.whl", hash = "sha256:4868f6bd7c9d98904b748a2653031fc9c2f85b6237009d475b1008bfaeb0a5aa"},
2840
- {file = "pydantic_core-2.20.1-cp38-cp38-musllinux_1_1_x86_64.whl", hash = "sha256:aa2f457b4af386254372dfa78a2eda2563680d982422641a85f271c859df1987"},
2841
- {file = "pydantic_core-2.20.1-cp38-none-win32.whl", hash = "sha256:225b67a1f6d602de0ce7f6c1c3ae89a4aa25d3de9be857999e9124f15dab486a"},
2842
- {file = "pydantic_core-2.20.1-cp38-none-win_amd64.whl", hash = "sha256:6b507132dcfc0dea440cce23ee2182c0ce7aba7054576efc65634f080dbe9434"},
2843
- {file = "pydantic_core-2.20.1-cp39-cp39-macosx_10_12_x86_64.whl", hash = "sha256:b03f7941783b4c4a26051846dea594628b38f6940a2fdc0df00b221aed39314c"},
2844
- {file = "pydantic_core-2.20.1-cp39-cp39-macosx_11_0_arm64.whl", hash = "sha256:1eedfeb6089ed3fad42e81a67755846ad4dcc14d73698c120a82e4ccf0f1f9f6"},
2845
- {file = "pydantic_core-2.20.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:635fee4e041ab9c479e31edda27fcf966ea9614fff1317e280d99eb3e5ab6fe2"},
2846
- {file = "pydantic_core-2.20.1-cp39-cp39-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:77bf3ac639c1ff567ae3b47f8d4cc3dc20f9966a2a6dd2311dcc055d3d04fb8a"},
2847
- {file = "pydantic_core-2.20.1-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:7ed1b0132f24beeec5a78b67d9388656d03e6a7c837394f99257e2d55b461611"},
2848
- {file = "pydantic_core-2.20.1-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:c6514f963b023aeee506678a1cf821fe31159b925c4b76fe2afa94cc70b3222b"},
2849
- {file = "pydantic_core-2.20.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:10d4204d8ca33146e761c79f83cc861df20e7ae9f6487ca290a97702daf56006"},
2850
- {file = "pydantic_core-2.20.1-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:2d036c7187b9422ae5b262badb87a20a49eb6c5238b2004e96d4da1231badef1"},
2851
- {file = "pydantic_core-2.20.1-cp39-cp39-musllinux_1_1_aarch64.whl", hash = "sha256:9ebfef07dbe1d93efb94b4700f2d278494e9162565a54f124c404a5656d7ff09"},
2852
- {file = "pydantic_core-2.20.1-cp39-cp39-musllinux_1_1_x86_64.whl", hash = "sha256:6b9d9bb600328a1ce523ab4f454859e9d439150abb0906c5a1983c146580ebab"},
2853
- {file = "pydantic_core-2.20.1-cp39-none-win32.whl", hash = "sha256:784c1214cb6dd1e3b15dd8b91b9a53852aed16671cc3fbe4786f4f1db07089e2"},
2854
- {file = "pydantic_core-2.20.1-cp39-none-win_amd64.whl", hash = "sha256:d2fe69c5434391727efa54b47a1e7986bb0186e72a41b203df8f5b0a19a4f669"},
2855
- {file = "pydantic_core-2.20.1-pp310-pypy310_pp73-macosx_10_12_x86_64.whl", hash = "sha256:a45f84b09ac9c3d35dfcf6a27fd0634d30d183205230a0ebe8373a0e8cfa0906"},
2856
- {file = "pydantic_core-2.20.1-pp310-pypy310_pp73-macosx_11_0_arm64.whl", hash = "sha256:d02a72df14dfdbaf228424573a07af10637bd490f0901cee872c4f434a735b94"},
2857
- {file = "pydantic_core-2.20.1-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:d2b27e6af28f07e2f195552b37d7d66b150adbaa39a6d327766ffd695799780f"},
2858
- {file = "pydantic_core-2.20.1-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:084659fac3c83fd674596612aeff6041a18402f1e1bc19ca39e417d554468482"},
2859
- {file = "pydantic_core-2.20.1-pp310-pypy310_pp73-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:242b8feb3c493ab78be289c034a1f659e8826e2233786e36f2893a950a719bb6"},
2860
- {file = "pydantic_core-2.20.1-pp310-pypy310_pp73-musllinux_1_1_aarch64.whl", hash = "sha256:38cf1c40a921d05c5edc61a785c0ddb4bed67827069f535d794ce6bcded919fc"},
2861
- {file = "pydantic_core-2.20.1-pp310-pypy310_pp73-musllinux_1_1_x86_64.whl", hash = "sha256:e0bbdd76ce9aa5d4209d65f2b27fc6e5ef1312ae6c5333c26db3f5ade53a1e99"},
2862
- {file = "pydantic_core-2.20.1-pp310-pypy310_pp73-win_amd64.whl", hash = "sha256:254ec27fdb5b1ee60684f91683be95e5133c994cc54e86a0b0963afa25c8f8a6"},
2863
- {file = "pydantic_core-2.20.1-pp39-pypy39_pp73-macosx_10_12_x86_64.whl", hash = "sha256:407653af5617f0757261ae249d3fba09504d7a71ab36ac057c938572d1bc9331"},
2864
- {file = "pydantic_core-2.20.1-pp39-pypy39_pp73-macosx_11_0_arm64.whl", hash = "sha256:c693e916709c2465b02ca0ad7b387c4f8423d1db7b4649c551f27a529181c5ad"},
2865
- {file = "pydantic_core-2.20.1-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:5b5ff4911aea936a47d9376fd3ab17e970cc543d1b68921886e7f64bd28308d1"},
2866
- {file = "pydantic_core-2.20.1-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:177f55a886d74f1808763976ac4efd29b7ed15c69f4d838bbd74d9d09cf6fa86"},
2867
- {file = "pydantic_core-2.20.1-pp39-pypy39_pp73-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:964faa8a861d2664f0c7ab0c181af0bea66098b1919439815ca8803ef136fc4e"},
2868
- {file = "pydantic_core-2.20.1-pp39-pypy39_pp73-musllinux_1_1_aarch64.whl", hash = "sha256:4dd484681c15e6b9a977c785a345d3e378d72678fd5f1f3c0509608da24f2ac0"},
2869
- {file = "pydantic_core-2.20.1-pp39-pypy39_pp73-musllinux_1_1_x86_64.whl", hash = "sha256:f6d6cff3538391e8486a431569b77921adfcdef14eb18fbf19b7c0a5294d4e6a"},
2870
- {file = "pydantic_core-2.20.1-pp39-pypy39_pp73-win_amd64.whl", hash = "sha256:a6d511cc297ff0883bc3708b465ff82d7560193169a8b93260f74ecb0a5e08a7"},
2871
- {file = "pydantic_core-2.20.1.tar.gz", hash = "sha256:26ca695eeee5f9f1aeeb211ffc12f10bcb6f71e2989988fda61dabd65db878d4"},
2872
  ]
2873
 
2874
  [package.dependencies]
@@ -2876,13 +2938,13 @@ typing-extensions = ">=4.6.0,<4.7.0 || >4.7.0"
2876
 
2877
  [[package]]
2878
  name = "pydantic-settings"
2879
- version = "2.4.0"
2880
  description = "Settings management using Pydantic"
2881
  optional = false
2882
  python-versions = ">=3.8"
2883
  files = [
2884
- {file = "pydantic_settings-2.4.0-py3-none-any.whl", hash = "sha256:bb6849dc067f1687574c12a639e231f3a6feeed0a12d710c1382045c5db1c315"},
2885
- {file = "pydantic_settings-2.4.0.tar.gz", hash = "sha256:ed81c3a0f46392b4d7c0a565c05884e6e54b3456e6f0fe4d8814981172dc9a88"},
2886
  ]
2887
 
2888
  [package.dependencies]
@@ -2949,6 +3011,20 @@ files = [
2949
  {file = "pypdfium2-4.30.0.tar.gz", hash = "sha256:48b5b7e5566665bc1015b9d69c1ebabe21f6aee468b509531c3c8318eeee2e16"},
2950
  ]
2951
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2952
  [[package]]
2953
  name = "python-dateutil"
2954
  version = "2.9.0.post0"
@@ -3743,87 +3819,103 @@ torch = ["safetensors[numpy]", "torch (>=1.10)"]
3743
 
3744
  [[package]]
3745
  name = "scikit-learn"
3746
- version = "1.4.2"
3747
  description = "A set of python modules for machine learning and data mining"
3748
  optional = false
3749
  python-versions = ">=3.9"
3750
  files = [
3751
- {file = "scikit-learn-1.4.2.tar.gz", hash = "sha256:daa1c471d95bad080c6e44b4946c9390a4842adc3082572c20e4f8884e39e959"},
3752
- {file = "scikit_learn-1.4.2-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:8539a41b3d6d1af82eb629f9c57f37428ff1481c1e34dddb3b9d7af8ede67ac5"},
3753
- {file = "scikit_learn-1.4.2-cp310-cp310-macosx_12_0_arm64.whl", hash = "sha256:68b8404841f944a4a1459b07198fa2edd41a82f189b44f3e1d55c104dbc2e40c"},
3754
- {file = "scikit_learn-1.4.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:81bf5d8bbe87643103334032dd82f7419bc8c8d02a763643a6b9a5c7288c5054"},
3755
- {file = "scikit_learn-1.4.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:36f0ea5d0f693cb247a073d21a4123bdf4172e470e6d163c12b74cbb1536cf38"},
3756
- {file = "scikit_learn-1.4.2-cp310-cp310-win_amd64.whl", hash = "sha256:87440e2e188c87db80ea4023440923dccbd56fbc2d557b18ced00fef79da0727"},
3757
- {file = "scikit_learn-1.4.2-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:45dee87ac5309bb82e3ea633955030df9bbcb8d2cdb30383c6cd483691c546cc"},
3758
- {file = "scikit_learn-1.4.2-cp311-cp311-macosx_12_0_arm64.whl", hash = "sha256:1d0b25d9c651fd050555aadd57431b53d4cf664e749069da77f3d52c5ad14b3b"},
3759
- {file = "scikit_learn-1.4.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:b0203c368058ab92efc6168a1507d388d41469c873e96ec220ca8e74079bf62e"},
3760
- {file = "scikit_learn-1.4.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:44c62f2b124848a28fd695db5bc4da019287abf390bfce602ddc8aa1ec186aae"},
3761
- {file = "scikit_learn-1.4.2-cp311-cp311-win_amd64.whl", hash = "sha256:5cd7b524115499b18b63f0c96f4224eb885564937a0b3477531b2b63ce331904"},
3762
- {file = "scikit_learn-1.4.2-cp312-cp312-macosx_10_9_x86_64.whl", hash = "sha256:90378e1747949f90c8f385898fff35d73193dfcaec3dd75d6b542f90c4e89755"},
3763
- {file = "scikit_learn-1.4.2-cp312-cp312-macosx_12_0_arm64.whl", hash = "sha256:ff4effe5a1d4e8fed260a83a163f7dbf4f6087b54528d8880bab1d1377bd78be"},
3764
- {file = "scikit_learn-1.4.2-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:671e2f0c3f2c15409dae4f282a3a619601fa824d2c820e5b608d9d775f91780c"},
3765
- {file = "scikit_learn-1.4.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:d36d0bc983336bbc1be22f9b686b50c964f593c8a9a913a792442af9bf4f5e68"},
3766
- {file = "scikit_learn-1.4.2-cp312-cp312-win_amd64.whl", hash = "sha256:d762070980c17ba3e9a4a1e043ba0518ce4c55152032f1af0ca6f39b376b5928"},
3767
- {file = "scikit_learn-1.4.2-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:d9993d5e78a8148b1d0fdf5b15ed92452af5581734129998c26f481c46586d68"},
3768
- {file = "scikit_learn-1.4.2-cp39-cp39-macosx_12_0_arm64.whl", hash = "sha256:426d258fddac674fdf33f3cb2d54d26f49406e2599dbf9a32b4d1696091d4256"},
3769
- {file = "scikit_learn-1.4.2-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:5460a1a5b043ae5ae4596b3126a4ec33ccba1b51e7ca2c5d36dac2169f62ab1d"},
3770
- {file = "scikit_learn-1.4.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:49d64ef6cb8c093d883e5a36c4766548d974898d378e395ba41a806d0e824db8"},
3771
- {file = "scikit_learn-1.4.2-cp39-cp39-win_amd64.whl", hash = "sha256:c97a50b05c194be9146d61fe87dbf8eac62b203d9e87a3ccc6ae9aed2dfaf361"},
 
 
 
 
 
3772
  ]
3773
 
3774
  [package.dependencies]
3775
  joblib = ">=1.2.0"
3776
  numpy = ">=1.19.5"
3777
  scipy = ">=1.6.0"
3778
- threadpoolctl = ">=2.0.0"
3779
 
3780
  [package.extras]
3781
- benchmark = ["matplotlib (>=3.3.4)", "memory-profiler (>=0.57.0)", "pandas (>=1.1.5)"]
3782
- docs = ["Pillow (>=7.1.2)", "matplotlib (>=3.3.4)", "memory-profiler (>=0.57.0)", "numpydoc (>=1.2.0)", "pandas (>=1.1.5)", "plotly (>=5.14.0)", "pooch (>=1.6.0)", "scikit-image (>=0.17.2)", "seaborn (>=0.9.0)", "sphinx (>=6.0.0)", "sphinx-copybutton (>=0.5.2)", "sphinx-gallery (>=0.15.0)", "sphinx-prompt (>=1.3.0)", "sphinxext-opengraph (>=0.4.2)"]
 
3783
  examples = ["matplotlib (>=3.3.4)", "pandas (>=1.1.5)", "plotly (>=5.14.0)", "pooch (>=1.6.0)", "scikit-image (>=0.17.2)", "seaborn (>=0.9.0)"]
3784
- tests = ["black (>=23.3.0)", "matplotlib (>=3.3.4)", "mypy (>=1.3)", "numpydoc (>=1.2.0)", "pandas (>=1.1.5)", "polars (>=0.19.12)", "pooch (>=1.6.0)", "pyamg (>=4.0.0)", "pyarrow (>=12.0.0)", "pytest (>=7.1.2)", "pytest-cov (>=2.9.0)", "ruff (>=0.0.272)", "scikit-image (>=0.17.2)"]
 
 
3785
 
3786
  [[package]]
3787
  name = "scipy"
3788
- version = "1.13.1"
3789
  description = "Fundamental algorithms for scientific computing in Python"
3790
  optional = false
3791
- python-versions = ">=3.9"
3792
- files = [
3793
- {file = "scipy-1.13.1-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:20335853b85e9a49ff7572ab453794298bcf0354d8068c5f6775a0eabf350aca"},
3794
- {file = "scipy-1.13.1-cp310-cp310-macosx_12_0_arm64.whl", hash = "sha256:d605e9c23906d1994f55ace80e0125c587f96c020037ea6aa98d01b4bd2e222f"},
3795
- {file = "scipy-1.13.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:cfa31f1def5c819b19ecc3a8b52d28ffdcc7ed52bb20c9a7589669dd3c250989"},
3796
- {file = "scipy-1.13.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:f26264b282b9da0952a024ae34710c2aff7d27480ee91a2e82b7b7073c24722f"},
3797
- {file = "scipy-1.13.1-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:eccfa1906eacc02de42d70ef4aecea45415f5be17e72b61bafcfd329bdc52e94"},
3798
- {file = "scipy-1.13.1-cp310-cp310-win_amd64.whl", hash = "sha256:2831f0dc9c5ea9edd6e51e6e769b655f08ec6db6e2e10f86ef39bd32eb11da54"},
3799
- {file = "scipy-1.13.1-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:27e52b09c0d3a1d5b63e1105f24177e544a222b43611aaf5bc44d4a0979e32f9"},
3800
- {file = "scipy-1.13.1-cp311-cp311-macosx_12_0_arm64.whl", hash = "sha256:54f430b00f0133e2224c3ba42b805bfd0086fe488835effa33fa291561932326"},
3801
- {file = "scipy-1.13.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:e89369d27f9e7b0884ae559a3a956e77c02114cc60a6058b4e5011572eea9299"},
3802
- {file = "scipy-1.13.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a78b4b3345f1b6f68a763c6e25c0c9a23a9fd0f39f5f3d200efe8feda560a5fa"},
3803
- {file = "scipy-1.13.1-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:45484bee6d65633752c490404513b9ef02475b4284c4cfab0ef946def50b3f59"},
3804
- {file = "scipy-1.13.1-cp311-cp311-win_amd64.whl", hash = "sha256:5713f62f781eebd8d597eb3f88b8bf9274e79eeabf63afb4a737abc6c84ad37b"},
3805
- {file = "scipy-1.13.1-cp312-cp312-macosx_10_9_x86_64.whl", hash = "sha256:5d72782f39716b2b3509cd7c33cdc08c96f2f4d2b06d51e52fb45a19ca0c86a1"},
3806
- {file = "scipy-1.13.1-cp312-cp312-macosx_12_0_arm64.whl", hash = "sha256:017367484ce5498445aade74b1d5ab377acdc65e27095155e448c88497755a5d"},
3807
- {file = "scipy-1.13.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:949ae67db5fa78a86e8fa644b9a6b07252f449dcf74247108c50e1d20d2b4627"},
3808
- {file = "scipy-1.13.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:de3ade0e53bc1f21358aa74ff4830235d716211d7d077e340c7349bc3542e884"},
3809
- {file = "scipy-1.13.1-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:2ac65fb503dad64218c228e2dc2d0a0193f7904747db43014645ae139c8fad16"},
3810
- {file = "scipy-1.13.1-cp312-cp312-win_amd64.whl", hash = "sha256:cdd7dacfb95fea358916410ec61bbc20440f7860333aee6d882bb8046264e949"},
3811
- {file = "scipy-1.13.1-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:436bbb42a94a8aeef855d755ce5a465479c721e9d684de76bf61a62e7c2b81d5"},
3812
- {file = "scipy-1.13.1-cp39-cp39-macosx_12_0_arm64.whl", hash = "sha256:8335549ebbca860c52bf3d02f80784e91a004b71b059e3eea9678ba994796a24"},
3813
- {file = "scipy-1.13.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:d533654b7d221a6a97304ab63c41c96473ff04459e404b83275b60aa8f4b7004"},
3814
- {file = "scipy-1.13.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:637e98dcf185ba7f8e663e122ebf908c4702420477ae52a04f9908707456ba4d"},
3815
- {file = "scipy-1.13.1-cp39-cp39-musllinux_1_1_x86_64.whl", hash = "sha256:a014c2b3697bde71724244f63de2476925596c24285c7a637364761f8710891c"},
3816
- {file = "scipy-1.13.1-cp39-cp39-win_amd64.whl", hash = "sha256:392e4ec766654852c25ebad4f64e4e584cf19820b980bc04960bca0b0cd6eaa2"},
3817
- {file = "scipy-1.13.1.tar.gz", hash = "sha256:095a87a0312b08dfd6a6155cbbd310a8c51800fc931b8c0b84003014b874ed3c"},
 
 
 
 
 
 
 
 
3818
  ]
3819
 
3820
  [package.dependencies]
3821
- numpy = ">=1.22.4,<2.3"
3822
 
3823
  [package.extras]
3824
- dev = ["cython-lint (>=0.12.2)", "doit (>=0.36.0)", "mypy", "pycodestyle", "pydevtool", "rich-click", "ruff", "types-psutil", "typing_extensions"]
3825
- doc = ["jupyterlite-pyodide-kernel", "jupyterlite-sphinx (>=0.12.0)", "jupytext", "matplotlib (>=3.5)", "myst-nb", "numpydoc", "pooch", "pydata-sphinx-theme (>=0.15.2)", "sphinx (>=5.0.0)", "sphinx-design (>=0.4.0)"]
3826
- test = ["array-api-strict", "asv", "gmpy2", "hypothesis (>=6.30)", "mpmath", "pooch", "pytest", "pytest-cov", "pytest-timeout", "pytest-xdist", "scikit-umfpack", "threadpoolctl"]
3827
 
3828
  [[package]]
3829
  name = "send2trash"
@@ -3956,19 +4048,20 @@ snowflake = ["snowflake-connector-python (>=2.8.0)", "snowflake-snowpark-python
3956
 
3957
  [[package]]
3958
  name = "surya-ocr"
3959
- version = "0.5.0"
3960
- description = "OCR, layout, reading order, and line detection in 90+ languages"
3961
  optional = false
3962
- python-versions = "!=2.7.*,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*,!=3.5.*,!=3.6.*,!=3.7.*,!=3.8.*,>=3.9"
3963
  files = [
3964
- {file = "surya_ocr-0.5.0-py3-none-any.whl", hash = "sha256:e70516d74f3816c5b2a61bdf8f7eeb5fbd5670514bc5ae2eb0947d33c60c22d3"},
3965
- {file = "surya_ocr-0.5.0.tar.gz", hash = "sha256:a80740c2b000d9630cf3d5525043c95096efaeb6b0892254ff32339a171e789a"},
3966
  ]
3967
 
3968
  [package.dependencies]
3969
  filetype = ">=1.2.0,<2.0.0"
3970
  ftfy = ">=6.1.3,<7.0.0"
3971
  opencv-python = ">=4.9.0.80,<5.0.0.0"
 
3972
  pillow = ">=10.2.0,<11.0.0"
3973
  pydantic = ">=2.5.3,<3.0.0"
3974
  pydantic-settings = ">=2.1.0,<3.0.0"
@@ -3995,6 +4088,27 @@ mpmath = ">=1.1.0,<1.4"
3995
  [package.extras]
3996
  dev = ["hypothesis (>=6.70.0)", "pytest (>=7.1.0)"]
3997
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3998
  [[package]]
3999
  name = "tabulate"
4000
  version = "0.9.0"
@@ -4858,22 +4972,7 @@ files = [
4858
  idna = ">=2.0"
4859
  multidict = ">=4.0"
4860
 
4861
- [[package]]
4862
- name = "zipp"
4863
- version = "3.19.2"
4864
- description = "Backport of pathlib-compatible object wrapper for zip files"
4865
- optional = false
4866
- python-versions = ">=3.8"
4867
- files = [
4868
- {file = "zipp-3.19.2-py3-none-any.whl", hash = "sha256:f091755f667055f2d02b32c53771a7a6c8b47e1fdbc4b72a8b9072b3eef8015c"},
4869
- {file = "zipp-3.19.2.tar.gz", hash = "sha256:bf1dcf6450f873a13e952a29504887c89e6de7506209e5b1bcc3460135d4de19"},
4870
- ]
4871
-
4872
- [package.extras]
4873
- doc = ["furo", "jaraco.packaging (>=9.3)", "jaraco.tidelift (>=1.4)", "rst.linker (>=1.9)", "sphinx (>=3.5)", "sphinx-lint"]
4874
- test = ["big-O", "importlib-resources", "jaraco.functools", "jaraco.itertools", "jaraco.test", "more-itertools", "pytest (>=6,!=8.1.*)", "pytest-checkdocs (>=2.4)", "pytest-cov", "pytest-enabler (>=2.2)", "pytest-ignore-flaky", "pytest-mypy", "pytest-ruff (>=0.2.1)"]
4875
-
4876
  [metadata]
4877
  lock-version = "2.0"
4878
- python-versions = ">=3.9,<3.13,!=3.9.7"
4879
- content-hash = "3f4bb2a0bfc8c717d377368f6e3fafcf7ef7d68030c6c16e0b3719dbdd9fca1f"
 
601
  {file = "colorama-0.4.6.tar.gz", hash = "sha256:08695f5cb7ed6e0531a20572697297273c47b8cae5a63ffc6d6ed5c201be6e44"},
602
  ]
603
 
604
+ [[package]]
605
+ name = "coloredlogs"
606
+ version = "15.0.1"
607
+ description = "Colored terminal output for Python's logging module"
608
+ optional = false
609
+ python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*"
610
+ files = [
611
+ {file = "coloredlogs-15.0.1-py2.py3-none-any.whl", hash = "sha256:612ee75c546f53e92e70049c9dbfcc18c935a2b9a53b66085ce9ef6a6e5c0934"},
612
+ {file = "coloredlogs-15.0.1.tar.gz", hash = "sha256:7c991aa71a4577af2f82600d8f8f3a89f936baeaf9b50a9c197da014e5bf16b0"},
613
+ ]
614
+
615
+ [package.dependencies]
616
+ humanfriendly = ">=9.1"
617
+
618
+ [package.extras]
619
+ cron = ["capturer (>=2.4)"]
620
+
621
  [[package]]
622
  name = "comm"
623
  version = "0.2.2"
 
816
  {file = "filetype-1.2.0.tar.gz", hash = "sha256:66b56cd6474bf41d8c54660347d37afcc3f7d1970648de365c102ef77548aadb"},
817
  ]
818
 
819
+ [[package]]
820
+ name = "flatbuffers"
821
+ version = "24.3.25"
822
+ description = "The FlatBuffers serialization format for Python"
823
+ optional = false
824
+ python-versions = "*"
825
+ files = [
826
+ {file = "flatbuffers-24.3.25-py2.py3-none-any.whl", hash = "sha256:8dbdec58f935f3765e4f7f3cf635ac3a77f83568138d6a2311f524ec96364812"},
827
+ {file = "flatbuffers-24.3.25.tar.gz", hash = "sha256:de2ec5b203f21441716617f38443e0a8ebf3d25bf0d9c0bb0ce68fa00ad546a4"},
828
+ ]
829
+
830
  [[package]]
831
  name = "fqdn"
832
  version = "1.5.1"
 
1160
  torch = ["safetensors", "torch"]
1161
  typing = ["types-PyYAML", "types-requests", "types-simplejson", "types-toml", "types-tqdm", "types-urllib3", "typing-extensions (>=4.8.0)"]
1162
 
1163
+ [[package]]
1164
+ name = "humanfriendly"
1165
+ version = "10.0"
1166
+ description = "Human friendly output for text interfaces using Python"
1167
+ optional = false
1168
+ python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*"
1169
+ files = [
1170
+ {file = "humanfriendly-10.0-py2.py3-none-any.whl", hash = "sha256:1697e1a8a8f550fd43c2865cd84542fc175a61dcb779b6fee18cf6b6ccba1477"},
1171
+ {file = "humanfriendly-10.0.tar.gz", hash = "sha256:6b0b831ce8f15f7300721aa49829fc4e83921a9a301cc7f606be6686a2288ddc"},
1172
+ ]
1173
+
1174
+ [package.dependencies]
1175
+ pyreadline3 = {version = "*", markers = "sys_platform == \"win32\" and python_version >= \"3.8\""}
1176
+
1177
  [[package]]
1178
  name = "idna"
1179
  version = "3.7"
 
1185
  {file = "idna-3.7.tar.gz", hash = "sha256:028ff3aadf0609c1fd278d8ea3089299412a7a8b9bd005dd08b9f8285bcb5cfc"},
1186
  ]
1187
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1188
  [[package]]
1189
  name = "intel-openmp"
1190
  version = "2021.4.0"
 
1254
  pygments = ">=2.4.0"
1255
  stack-data = "*"
1256
  traitlets = ">=5"
 
1257
 
1258
  [package.extras]
1259
  all = ["black", "curio", "docrepr", "exceptiongroup", "ipykernel", "ipyparallel", "ipywidgets", "matplotlib", "matplotlib (!=3.2.0)", "nbconvert", "nbformat", "notebook", "numpy (>=1.22)", "pandas", "pickleshare", "pytest (<7)", "pytest (<7.1)", "pytest-asyncio (<0.22)", "qtconsole", "setuptools (>=18.5)", "sphinx (>=1.3)", "sphinx-rtd-theme", "stack-data", "testpath", "trio", "typing-extensions"]
 
1447
  ]
1448
 
1449
  [package.dependencies]
 
1450
  jupyter-core = ">=4.12,<5.0.dev0 || >=5.1.dev0"
1451
  python-dateutil = ">=2.8.2"
1452
  pyzmq = ">=23.0"
 
1538
  ]
1539
 
1540
  [package.dependencies]
 
1541
  jupyter-server = ">=1.1.2"
1542
 
1543
  [[package]]
 
1609
  [package.dependencies]
1610
  async-lru = ">=1.0.0"
1611
  httpx = ">=0.25.0"
 
1612
  ipykernel = ">=6.5.0"
1613
  jinja2 = ">=3.0.3"
1614
  jupyter-core = "*"
 
1653
 
1654
  [package.dependencies]
1655
  babel = ">=2.10"
 
1656
  jinja2 = ">=3.0.3"
1657
  json5 = ">=0.9.0"
1658
  jsonschema = ">=4.18.0"
 
2017
  beautifulsoup4 = "*"
2018
  bleach = "!=5.0.0"
2019
  defusedxml = "*"
 
2020
  jinja2 = ">=3.0"
2021
  jupyter-core = ">=4.7"
2022
  jupyterlab-pygments = "*"
 
2301
  optional = false
2302
  python-versions = ">=3"
2303
  files = [
2304
+ {file = "nvidia_nvjitlink_cu12-12.5.82-py3-none-manylinux2014_aarch64.whl", hash = "sha256:98103729cc5226e13ca319a10bbf9433bbbd44ef64fe72f45f067cacc14b8d27"},
2305
  {file = "nvidia_nvjitlink_cu12-12.5.82-py3-none-manylinux2014_x86_64.whl", hash = "sha256:f9b37bc5c8cf7509665cb6ada5aaa0ce65618f2332b7d3e78e9790511f111212"},
2306
  {file = "nvidia_nvjitlink_cu12-12.5.82-py3-none-win_amd64.whl", hash = "sha256:e782564d705ff0bf61ac3e1bf730166da66dd2fe9012f111ede5fc49b64ae697"},
2307
  ]
 
2317
  {file = "nvidia_nvtx_cu12-12.1.105-py3-none-win_amd64.whl", hash = "sha256:65f4d98982b31b60026e0e6de73fbdfc09d08a96f4656dd3665ca616a11e1e82"},
2318
  ]
2319
 
2320
+ [[package]]
2321
+ name = "onnxruntime"
2322
+ version = "1.19.2"
2323
+ description = "ONNX Runtime is a runtime accelerator for Machine Learning models"
2324
+ optional = false
2325
+ python-versions = "*"
2326
+ files = [
2327
+ {file = "onnxruntime-1.19.2-cp310-cp310-macosx_11_0_universal2.whl", hash = "sha256:84fa57369c06cadd3c2a538ae2a26d76d583e7c34bdecd5769d71ca5c0fc750e"},
2328
+ {file = "onnxruntime-1.19.2-cp310-cp310-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:bdc471a66df0c1cdef774accef69e9f2ca168c851ab5e4f2f3341512c7ef4666"},
2329
+ {file = "onnxruntime-1.19.2-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:e3a4ce906105d99ebbe817f536d50a91ed8a4d1592553f49b3c23c4be2560ae6"},
2330
+ {file = "onnxruntime-1.19.2-cp310-cp310-win32.whl", hash = "sha256:4b3d723cc154c8ddeb9f6d0a8c0d6243774c6b5930847cc83170bfe4678fafb3"},
2331
+ {file = "onnxruntime-1.19.2-cp310-cp310-win_amd64.whl", hash = "sha256:17ed7382d2c58d4b7354fb2b301ff30b9bf308a1c7eac9546449cd122d21cae5"},
2332
+ {file = "onnxruntime-1.19.2-cp311-cp311-macosx_11_0_universal2.whl", hash = "sha256:d863e8acdc7232d705d49e41087e10b274c42f09e259016a46f32c34e06dc4fd"},
2333
+ {file = "onnxruntime-1.19.2-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:c1dfe4f660a71b31caa81fc298a25f9612815215a47b286236e61d540350d7b6"},
2334
+ {file = "onnxruntime-1.19.2-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:a36511dc07c5c964b916697e42e366fa43c48cdb3d3503578d78cef30417cb84"},
2335
+ {file = "onnxruntime-1.19.2-cp311-cp311-win32.whl", hash = "sha256:50cbb8dc69d6befad4746a69760e5b00cc3ff0a59c6c3fb27f8afa20e2cab7e7"},
2336
+ {file = "onnxruntime-1.19.2-cp311-cp311-win_amd64.whl", hash = "sha256:1c3e5d415b78337fa0b1b75291e9ea9fb2a4c1f148eb5811e7212fed02cfffa8"},
2337
+ {file = "onnxruntime-1.19.2-cp312-cp312-macosx_11_0_universal2.whl", hash = "sha256:68e7051bef9cfefcbb858d2d2646536829894d72a4130c24019219442b1dd2ed"},
2338
+ {file = "onnxruntime-1.19.2-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:d2d366fbcc205ce68a8a3bde2185fd15c604d9645888703785b61ef174265168"},
2339
+ {file = "onnxruntime-1.19.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:477b93df4db467e9cbf34051662a4b27c18e131fa1836e05974eae0d6e4cf29b"},
2340
+ {file = "onnxruntime-1.19.2-cp312-cp312-win32.whl", hash = "sha256:9a174073dc5608fad05f7cf7f320b52e8035e73d80b0a23c80f840e5a97c0147"},
2341
+ {file = "onnxruntime-1.19.2-cp312-cp312-win_amd64.whl", hash = "sha256:190103273ea4507638ffc31d66a980594b237874b65379e273125150eb044857"},
2342
+ {file = "onnxruntime-1.19.2-cp38-cp38-macosx_11_0_universal2.whl", hash = "sha256:636bc1d4cc051d40bc52e1f9da87fbb9c57d9d47164695dfb1c41646ea51ea66"},
2343
+ {file = "onnxruntime-1.19.2-cp38-cp38-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:5bd8b875757ea941cbcfe01582970cc299893d1b65bd56731e326a8333f638a3"},
2344
+ {file = "onnxruntime-1.19.2-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:b2046fc9560f97947bbc1acbe4c6d48585ef0f12742744307d3364b131ac5778"},
2345
+ {file = "onnxruntime-1.19.2-cp38-cp38-win32.whl", hash = "sha256:31c12840b1cde4ac1f7d27d540c44e13e34f2345cf3642762d2a3333621abb6a"},
2346
+ {file = "onnxruntime-1.19.2-cp38-cp38-win_amd64.whl", hash = "sha256:016229660adea180e9a32ce218b95f8f84860a200f0f13b50070d7d90e92956c"},
2347
+ {file = "onnxruntime-1.19.2-cp39-cp39-macosx_11_0_universal2.whl", hash = "sha256:006c8d326835c017a9e9f74c9c77ebb570a71174a1e89fe078b29a557d9c3848"},
2348
+ {file = "onnxruntime-1.19.2-cp39-cp39-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:df2a94179a42d530b936f154615b54748239c2908ee44f0d722cb4df10670f68"},
2349
+ {file = "onnxruntime-1.19.2-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:fae4b4de45894b9ce7ae418c5484cbf0341db6813effec01bb2216091c52f7fb"},
2350
+ {file = "onnxruntime-1.19.2-cp39-cp39-win32.whl", hash = "sha256:dc5430f473e8706fff837ae01323be9dcfddd3ea471c900a91fa7c9b807ec5d3"},
2351
+ {file = "onnxruntime-1.19.2-cp39-cp39-win_amd64.whl", hash = "sha256:38475e29a95c5f6c62c2c603d69fc7d4c6ccbf4df602bd567b86ae1138881c49"},
2352
+ ]
2353
+
2354
+ [package.dependencies]
2355
+ coloredlogs = "*"
2356
+ flatbuffers = "*"
2357
+ numpy = ">=1.21.6"
2358
+ packaging = "*"
2359
+ protobuf = "*"
2360
+ sympy = "*"
2361
+
2362
  [[package]]
2363
  name = "opencv-python"
2364
  version = "4.10.0.84"
 
2377
 
2378
  [package.dependencies]
2379
  numpy = [
2380
+ {version = ">=1.26.0", markers = "python_version >= \"3.12\""},
2381
  {version = ">=1.21.4", markers = "python_version >= \"3.10\" and platform_system == \"Darwin\" and python_version < \"3.11\""},
2382
  {version = ">=1.21.2", markers = "platform_system != \"Darwin\" and python_version >= \"3.10\" and python_version < \"3.11\""},
 
2383
  {version = ">=1.23.5", markers = "python_version >= \"3.11\" and python_version < \"3.12\""},
 
2384
  ]
2385
 
2386
  [[package]]
 
2445
 
2446
  [package.dependencies]
2447
  numpy = [
2448
+ {version = ">=1.26.0", markers = "python_version >= \"3.12\""},
2449
  {version = ">=1.22.4", markers = "python_version < \"3.11\""},
2450
  {version = ">=1.23.2", markers = "python_version == \"3.11\""},
 
2451
  ]
2452
  python-dateutil = ">=2.8.2"
2453
  pytz = ">=2020.1"
 
2506
 
2507
  [[package]]
2508
  name = "pdftext"
2509
+ version = "0.3.13"
2510
  description = "Extract structured text from pdfs quickly"
2511
  optional = false
2512
  python-versions = "!=2.7.*,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*,!=3.5.*,!=3.6.*,!=3.7.*,!=3.8.*,>=3.9"
2513
  files = [
2514
+ {file = "pdftext-0.3.13-py3-none-any.whl", hash = "sha256:ae8f6876cdbbc1fe611527bb362cd3d584b4c8ec9370215560f2a01be4343bbc"},
2515
+ {file = "pdftext-0.3.13.tar.gz", hash = "sha256:a37ceb759ac0da34c48f85ab5d43d0b128ad9526f949e98b96568495c7be4187"},
2516
  ]
2517
 
2518
  [package.dependencies]
2519
+ onnxruntime = ">=1.19.2,<2.0.0"
2520
  pydantic = ">=2.7.1,<3.0.0"
2521
  pydantic-settings = ">=2.2.1,<3.0.0"
2522
  pypdfium2 = ">=4.29.0,<5.0.0"
 
2523
 
2524
  [[package]]
2525
  name = "pexpect"
 
2814
 
2815
  [[package]]
2816
  name = "pydantic"
2817
+ version = "2.9.2"
2818
  description = "Data validation using Python type hints"
2819
  optional = false
2820
  python-versions = ">=3.8"
2821
  files = [
2822
+ {file = "pydantic-2.9.2-py3-none-any.whl", hash = "sha256:f048cec7b26778210e28a0459867920654d48e5e62db0958433636cde4254f12"},
2823
+ {file = "pydantic-2.9.2.tar.gz", hash = "sha256:d155cef71265d1e9807ed1c32b4c8deec042a44a50a4188b25ac67ecd81a9c0f"},
2824
  ]
2825
 
2826
  [package.dependencies]
2827
+ annotated-types = ">=0.6.0"
2828
+ pydantic-core = "2.23.4"
2829
+ typing-extensions = [
2830
+ {version = ">=4.12.2", markers = "python_version >= \"3.13\""},
2831
+ {version = ">=4.6.1", markers = "python_version < \"3.13\""},
2832
+ ]
2833
 
2834
  [package.extras]
2835
  email = ["email-validator (>=2.0.0)"]
2836
+ timezone = ["tzdata"]
2837
 
2838
  [[package]]
2839
  name = "pydantic-core"
2840
+ version = "2.23.4"
2841
  description = "Core functionality for Pydantic validation and serialization"
2842
  optional = false
2843
  python-versions = ">=3.8"
2844
  files = [
2845
+ {file = "pydantic_core-2.23.4-cp310-cp310-macosx_10_12_x86_64.whl", hash = "sha256:b10bd51f823d891193d4717448fab065733958bdb6a6b351967bd349d48d5c9b"},
2846
+ {file = "pydantic_core-2.23.4-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:4fc714bdbfb534f94034efaa6eadd74e5b93c8fa6315565a222f7b6f42ca1166"},
2847
+ {file = "pydantic_core-2.23.4-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:63e46b3169866bd62849936de036f901a9356e36376079b05efa83caeaa02ceb"},
2848
+ {file = "pydantic_core-2.23.4-cp310-cp310-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:ed1a53de42fbe34853ba90513cea21673481cd81ed1be739f7f2efb931b24916"},
2849
+ {file = "pydantic_core-2.23.4-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:cfdd16ab5e59fc31b5e906d1a3f666571abc367598e3e02c83403acabc092e07"},
2850
+ {file = "pydantic_core-2.23.4-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:255a8ef062cbf6674450e668482456abac99a5583bbafb73f9ad469540a3a232"},
2851
+ {file = "pydantic_core-2.23.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:4a7cd62e831afe623fbb7aabbb4fe583212115b3ef38a9f6b71869ba644624a2"},
2852
+ {file = "pydantic_core-2.23.4-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:f09e2ff1f17c2b51f2bc76d1cc33da96298f0a036a137f5440ab3ec5360b624f"},
2853
+ {file = "pydantic_core-2.23.4-cp310-cp310-musllinux_1_1_aarch64.whl", hash = "sha256:e38e63e6f3d1cec5a27e0afe90a085af8b6806ee208b33030e65b6516353f1a3"},
2854
+ {file = "pydantic_core-2.23.4-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:0dbd8dbed2085ed23b5c04afa29d8fd2771674223135dc9bc937f3c09284d071"},
2855
+ {file = "pydantic_core-2.23.4-cp310-none-win32.whl", hash = "sha256:6531b7ca5f951d663c339002e91aaebda765ec7d61b7d1e3991051906ddde119"},
2856
+ {file = "pydantic_core-2.23.4-cp310-none-win_amd64.whl", hash = "sha256:7c9129eb40958b3d4500fa2467e6a83356b3b61bfff1b414c7361d9220f9ae8f"},
2857
+ {file = "pydantic_core-2.23.4-cp311-cp311-macosx_10_12_x86_64.whl", hash = "sha256:77733e3892bb0a7fa797826361ce8a9184d25c8dffaec60b7ffe928153680ba8"},
2858
+ {file = "pydantic_core-2.23.4-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:1b84d168f6c48fabd1f2027a3d1bdfe62f92cade1fb273a5d68e621da0e44e6d"},
2859
+ {file = "pydantic_core-2.23.4-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:df49e7a0861a8c36d089c1ed57d308623d60416dab2647a4a17fe050ba85de0e"},
2860
+ {file = "pydantic_core-2.23.4-cp311-cp311-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:ff02b6d461a6de369f07ec15e465a88895f3223eb75073ffea56b84d9331f607"},
2861
+ {file = "pydantic_core-2.23.4-cp311-cp311-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:996a38a83508c54c78a5f41456b0103c30508fed9abcad0a59b876d7398f25fd"},
2862
+ {file = "pydantic_core-2.23.4-cp311-cp311-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:d97683ddee4723ae8c95d1eddac7c192e8c552da0c73a925a89fa8649bf13eea"},
2863
+ {file = "pydantic_core-2.23.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:216f9b2d7713eb98cb83c80b9c794de1f6b7e3145eef40400c62e86cee5f4e1e"},
2864
+ {file = "pydantic_core-2.23.4-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:6f783e0ec4803c787bcea93e13e9932edab72068f68ecffdf86a99fd5918878b"},
2865
+ {file = "pydantic_core-2.23.4-cp311-cp311-musllinux_1_1_aarch64.whl", hash = "sha256:d0776dea117cf5272382634bd2a5c1b6eb16767c223c6a5317cd3e2a757c61a0"},
2866
+ {file = "pydantic_core-2.23.4-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:d5f7a395a8cf1621939692dba2a6b6a830efa6b3cee787d82c7de1ad2930de64"},
2867
+ {file = "pydantic_core-2.23.4-cp311-none-win32.whl", hash = "sha256:74b9127ffea03643e998e0c5ad9bd3811d3dac8c676e47db17b0ee7c3c3bf35f"},
2868
+ {file = "pydantic_core-2.23.4-cp311-none-win_amd64.whl", hash = "sha256:98d134c954828488b153d88ba1f34e14259284f256180ce659e8d83e9c05eaa3"},
2869
+ {file = "pydantic_core-2.23.4-cp312-cp312-macosx_10_12_x86_64.whl", hash = "sha256:f3e0da4ebaef65158d4dfd7d3678aad692f7666877df0002b8a522cdf088f231"},
2870
+ {file = "pydantic_core-2.23.4-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:f69a8e0b033b747bb3e36a44e7732f0c99f7edd5cea723d45bc0d6e95377ffee"},
2871
+ {file = "pydantic_core-2.23.4-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:723314c1d51722ab28bfcd5240d858512ffd3116449c557a1336cbe3919beb87"},
2872
+ {file = "pydantic_core-2.23.4-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:bb2802e667b7051a1bebbfe93684841cc9351004e2badbd6411bf357ab8d5ac8"},
2873
+ {file = "pydantic_core-2.23.4-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:d18ca8148bebe1b0a382a27a8ee60350091a6ddaf475fa05ef50dc35b5df6327"},
2874
+ {file = "pydantic_core-2.23.4-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:33e3d65a85a2a4a0dc3b092b938a4062b1a05f3a9abde65ea93b233bca0e03f2"},
2875
+ {file = "pydantic_core-2.23.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:128585782e5bfa515c590ccee4b727fb76925dd04a98864182b22e89a4e6ed36"},
2876
+ {file = "pydantic_core-2.23.4-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:68665f4c17edcceecc112dfed5dbe6f92261fb9d6054b47d01bf6371a6196126"},
2877
+ {file = "pydantic_core-2.23.4-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:20152074317d9bed6b7a95ade3b7d6054845d70584216160860425f4fbd5ee9e"},
2878
+ {file = "pydantic_core-2.23.4-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:9261d3ce84fa1d38ed649c3638feefeae23d32ba9182963e465d58d62203bd24"},
2879
+ {file = "pydantic_core-2.23.4-cp312-none-win32.whl", hash = "sha256:4ba762ed58e8d68657fc1281e9bb72e1c3e79cc5d464be146e260c541ec12d84"},
2880
+ {file = "pydantic_core-2.23.4-cp312-none-win_amd64.whl", hash = "sha256:97df63000f4fea395b2824da80e169731088656d1818a11b95f3b173747b6cd9"},
2881
+ {file = "pydantic_core-2.23.4-cp313-cp313-macosx_10_12_x86_64.whl", hash = "sha256:7530e201d10d7d14abce4fb54cfe5b94a0aefc87da539d0346a484ead376c3cc"},
2882
+ {file = "pydantic_core-2.23.4-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:df933278128ea1cd77772673c73954e53a1c95a4fdf41eef97c2b779271bd0bd"},
2883
+ {file = "pydantic_core-2.23.4-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:0cb3da3fd1b6a5d0279a01877713dbda118a2a4fc6f0d821a57da2e464793f05"},
2884
+ {file = "pydantic_core-2.23.4-cp313-cp313-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:42c6dcb030aefb668a2b7009c85b27f90e51e6a3b4d5c9bc4c57631292015b0d"},
2885
+ {file = "pydantic_core-2.23.4-cp313-cp313-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:696dd8d674d6ce621ab9d45b205df149399e4bb9aa34102c970b721554828510"},
2886
+ {file = "pydantic_core-2.23.4-cp313-cp313-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:2971bb5ffe72cc0f555c13e19b23c85b654dd2a8f7ab493c262071377bfce9f6"},
2887
+ {file = "pydantic_core-2.23.4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:8394d940e5d400d04cad4f75c0598665cbb81aecefaca82ca85bd28264af7f9b"},
2888
+ {file = "pydantic_core-2.23.4-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:0dff76e0602ca7d4cdaacc1ac4c005e0ce0dcfe095d5b5259163a80d3a10d327"},
2889
+ {file = "pydantic_core-2.23.4-cp313-cp313-musllinux_1_1_aarch64.whl", hash = "sha256:7d32706badfe136888bdea71c0def994644e09fff0bfe47441deaed8e96fdbc6"},
2890
+ {file = "pydantic_core-2.23.4-cp313-cp313-musllinux_1_1_x86_64.whl", hash = "sha256:ed541d70698978a20eb63d8c5d72f2cc6d7079d9d90f6b50bad07826f1320f5f"},
2891
+ {file = "pydantic_core-2.23.4-cp313-none-win32.whl", hash = "sha256:3d5639516376dce1940ea36edf408c554475369f5da2abd45d44621cb616f769"},
2892
+ {file = "pydantic_core-2.23.4-cp313-none-win_amd64.whl", hash = "sha256:5a1504ad17ba4210df3a045132a7baeeba5a200e930f57512ee02909fc5c4cb5"},
2893
+ {file = "pydantic_core-2.23.4-cp38-cp38-macosx_10_12_x86_64.whl", hash = "sha256:d4488a93b071c04dc20f5cecc3631fc78b9789dd72483ba15d423b5b3689b555"},
2894
+ {file = "pydantic_core-2.23.4-cp38-cp38-macosx_11_0_arm64.whl", hash = "sha256:81965a16b675b35e1d09dd14df53f190f9129c0202356ed44ab2728b1c905658"},
2895
+ {file = "pydantic_core-2.23.4-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:4ffa2ebd4c8530079140dd2d7f794a9d9a73cbb8e9d59ffe24c63436efa8f271"},
2896
+ {file = "pydantic_core-2.23.4-cp38-cp38-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:61817945f2fe7d166e75fbfb28004034b48e44878177fc54d81688e7b85a3665"},
2897
+ {file = "pydantic_core-2.23.4-cp38-cp38-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:29d2c342c4bc01b88402d60189f3df065fb0dda3654744d5a165a5288a657368"},
2898
+ {file = "pydantic_core-2.23.4-cp38-cp38-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:5e11661ce0fd30a6790e8bcdf263b9ec5988e95e63cf901972107efc49218b13"},
2899
+ {file = "pydantic_core-2.23.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:9d18368b137c6295db49ce7218b1a9ba15c5bc254c96d7c9f9e924a9bc7825ad"},
2900
+ {file = "pydantic_core-2.23.4-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:ec4e55f79b1c4ffb2eecd8a0cfba9955a2588497d96851f4c8f99aa4a1d39b12"},
2901
+ {file = "pydantic_core-2.23.4-cp38-cp38-musllinux_1_1_aarch64.whl", hash = "sha256:374a5e5049eda9e0a44c696c7ade3ff355f06b1fe0bb945ea3cac2bc336478a2"},
2902
+ {file = "pydantic_core-2.23.4-cp38-cp38-musllinux_1_1_x86_64.whl", hash = "sha256:5c364564d17da23db1106787675fc7af45f2f7b58b4173bfdd105564e132e6fb"},
2903
+ {file = "pydantic_core-2.23.4-cp38-none-win32.whl", hash = "sha256:d7a80d21d613eec45e3d41eb22f8f94ddc758a6c4720842dc74c0581f54993d6"},
2904
+ {file = "pydantic_core-2.23.4-cp38-none-win_amd64.whl", hash = "sha256:5f5ff8d839f4566a474a969508fe1c5e59c31c80d9e140566f9a37bba7b8d556"},
2905
+ {file = "pydantic_core-2.23.4-cp39-cp39-macosx_10_12_x86_64.whl", hash = "sha256:a4fa4fc04dff799089689f4fd502ce7d59de529fc2f40a2c8836886c03e0175a"},
2906
+ {file = "pydantic_core-2.23.4-cp39-cp39-macosx_11_0_arm64.whl", hash = "sha256:0a7df63886be5e270da67e0966cf4afbae86069501d35c8c1b3b6c168f42cb36"},
2907
+ {file = "pydantic_core-2.23.4-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:dcedcd19a557e182628afa1d553c3895a9f825b936415d0dbd3cd0bbcfd29b4b"},
2908
+ {file = "pydantic_core-2.23.4-cp39-cp39-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:5f54b118ce5de9ac21c363d9b3caa6c800341e8c47a508787e5868c6b79c9323"},
2909
+ {file = "pydantic_core-2.23.4-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:86d2f57d3e1379a9525c5ab067b27dbb8a0642fb5d454e17a9ac434f9ce523e3"},
2910
+ {file = "pydantic_core-2.23.4-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:de6d1d1b9e5101508cb37ab0d972357cac5235f5c6533d1071964c47139257df"},
2911
+ {file = "pydantic_core-2.23.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:1278e0d324f6908e872730c9102b0112477a7f7cf88b308e4fc36ce1bdb6d58c"},
2912
+ {file = "pydantic_core-2.23.4-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:9a6b5099eeec78827553827f4c6b8615978bb4b6a88e5d9b93eddf8bb6790f55"},
2913
+ {file = "pydantic_core-2.23.4-cp39-cp39-musllinux_1_1_aarch64.whl", hash = "sha256:e55541f756f9b3ee346b840103f32779c695a19826a4c442b7954550a0972040"},
2914
+ {file = "pydantic_core-2.23.4-cp39-cp39-musllinux_1_1_x86_64.whl", hash = "sha256:a5c7ba8ffb6d6f8f2ab08743be203654bb1aaa8c9dcb09f82ddd34eadb695605"},
2915
+ {file = "pydantic_core-2.23.4-cp39-none-win32.whl", hash = "sha256:37b0fe330e4a58d3c58b24d91d1eb102aeec675a3db4c292ec3928ecd892a9a6"},
2916
+ {file = "pydantic_core-2.23.4-cp39-none-win_amd64.whl", hash = "sha256:1498bec4c05c9c787bde9125cfdcc63a41004ff167f495063191b863399b1a29"},
2917
+ {file = "pydantic_core-2.23.4-pp310-pypy310_pp73-macosx_10_12_x86_64.whl", hash = "sha256:f455ee30a9d61d3e1a15abd5068827773d6e4dc513e795f380cdd59932c782d5"},
2918
+ {file = "pydantic_core-2.23.4-pp310-pypy310_pp73-macosx_11_0_arm64.whl", hash = "sha256:1e90d2e3bd2c3863d48525d297cd143fe541be8bbf6f579504b9712cb6b643ec"},
2919
+ {file = "pydantic_core-2.23.4-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:2e203fdf807ac7e12ab59ca2bfcabb38c7cf0b33c41efeb00f8e5da1d86af480"},
2920
+ {file = "pydantic_core-2.23.4-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:e08277a400de01bc72436a0ccd02bdf596631411f592ad985dcee21445bd0068"},
2921
+ {file = "pydantic_core-2.23.4-pp310-pypy310_pp73-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:f220b0eea5965dec25480b6333c788fb72ce5f9129e8759ef876a1d805d00801"},
2922
+ {file = "pydantic_core-2.23.4-pp310-pypy310_pp73-musllinux_1_1_aarch64.whl", hash = "sha256:d06b0c8da4f16d1d1e352134427cb194a0a6e19ad5db9161bf32b2113409e728"},
2923
+ {file = "pydantic_core-2.23.4-pp310-pypy310_pp73-musllinux_1_1_x86_64.whl", hash = "sha256:ba1a0996f6c2773bd83e63f18914c1de3c9dd26d55f4ac302a7efe93fb8e7433"},
2924
+ {file = "pydantic_core-2.23.4-pp310-pypy310_pp73-win_amd64.whl", hash = "sha256:9a5bce9d23aac8f0cf0836ecfc033896aa8443b501c58d0602dbfd5bd5b37753"},
2925
+ {file = "pydantic_core-2.23.4-pp39-pypy39_pp73-macosx_10_12_x86_64.whl", hash = "sha256:78ddaaa81421a29574a682b3179d4cf9e6d405a09b99d93ddcf7e5239c742e21"},
2926
+ {file = "pydantic_core-2.23.4-pp39-pypy39_pp73-macosx_11_0_arm64.whl", hash = "sha256:883a91b5dd7d26492ff2f04f40fbb652de40fcc0afe07e8129e8ae779c2110eb"},
2927
+ {file = "pydantic_core-2.23.4-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:88ad334a15b32a791ea935af224b9de1bf99bcd62fabf745d5f3442199d86d59"},
2928
+ {file = "pydantic_core-2.23.4-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:233710f069d251feb12a56da21e14cca67994eab08362207785cf8c598e74577"},
2929
+ {file = "pydantic_core-2.23.4-pp39-pypy39_pp73-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:19442362866a753485ba5e4be408964644dd6a09123d9416c54cd49171f50744"},
2930
+ {file = "pydantic_core-2.23.4-pp39-pypy39_pp73-musllinux_1_1_aarch64.whl", hash = "sha256:624e278a7d29b6445e4e813af92af37820fafb6dcc55c012c834f9e26f9aaaef"},
2931
+ {file = "pydantic_core-2.23.4-pp39-pypy39_pp73-musllinux_1_1_x86_64.whl", hash = "sha256:f5ef8f42bec47f21d07668a043f077d507e5bf4e668d5c6dfe6aaba89de1a5b8"},
2932
+ {file = "pydantic_core-2.23.4-pp39-pypy39_pp73-win_amd64.whl", hash = "sha256:aea443fffa9fbe3af1a9ba721a87f926fe548d32cab71d188a6ede77d0ff244e"},
2933
+ {file = "pydantic_core-2.23.4.tar.gz", hash = "sha256:2584f7cf844ac4d970fba483a717dbe10c1c1c96a969bf65d61ffe94df1b2863"},
2934
  ]
2935
 
2936
  [package.dependencies]
 
2938
 
2939
  [[package]]
2940
  name = "pydantic-settings"
2941
+ version = "2.5.2"
2942
  description = "Settings management using Pydantic"
2943
  optional = false
2944
  python-versions = ">=3.8"
2945
  files = [
2946
+ {file = "pydantic_settings-2.5.2-py3-none-any.whl", hash = "sha256:2c912e55fd5794a59bf8c832b9de832dcfdf4778d79ff79b708744eed499a907"},
2947
+ {file = "pydantic_settings-2.5.2.tar.gz", hash = "sha256:f90b139682bee4d2065273d5185d71d37ea46cfe57e1b5ae184fc6a0b2484ca0"},
2948
  ]
2949
 
2950
  [package.dependencies]
 
3011
  {file = "pypdfium2-4.30.0.tar.gz", hash = "sha256:48b5b7e5566665bc1015b9d69c1ebabe21f6aee468b509531c3c8318eeee2e16"},
3012
  ]
3013
 
3014
+ [[package]]
3015
+ name = "pyreadline3"
3016
+ version = "3.5.4"
3017
+ description = "A python implementation of GNU readline."
3018
+ optional = false
3019
+ python-versions = ">=3.8"
3020
+ files = [
3021
+ {file = "pyreadline3-3.5.4-py3-none-any.whl", hash = "sha256:eaf8e6cc3c49bcccf145fc6067ba8643d1df34d604a1ec0eccbf7a18e6d3fae6"},
3022
+ {file = "pyreadline3-3.5.4.tar.gz", hash = "sha256:8d57d53039a1c75adba8e50dd3d992b28143480816187ea5efbd5c78e6c885b7"},
3023
+ ]
3024
+
3025
+ [package.extras]
3026
+ dev = ["build", "flake8", "mypy", "pytest", "twine"]
3027
+
3028
  [[package]]
3029
  name = "python-dateutil"
3030
  version = "2.9.0.post0"
 
3819
 
3820
  [[package]]
3821
  name = "scikit-learn"
3822
+ version = "1.5.2"
3823
  description = "A set of python modules for machine learning and data mining"
3824
  optional = false
3825
  python-versions = ">=3.9"
3826
  files = [
3827
+ {file = "scikit_learn-1.5.2-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:299406827fb9a4f862626d0fe6c122f5f87f8910b86fe5daa4c32dcd742139b6"},
3828
+ {file = "scikit_learn-1.5.2-cp310-cp310-macosx_12_0_arm64.whl", hash = "sha256:2d4cad1119c77930b235579ad0dc25e65c917e756fe80cab96aa3b9428bd3fb0"},
3829
+ {file = "scikit_learn-1.5.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:8c412ccc2ad9bf3755915e3908e677b367ebc8d010acbb3f182814524f2e5540"},
3830
+ {file = "scikit_learn-1.5.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:3a686885a4b3818d9e62904d91b57fa757fc2bed3e465c8b177be652f4dd37c8"},
3831
+ {file = "scikit_learn-1.5.2-cp310-cp310-win_amd64.whl", hash = "sha256:c15b1ca23d7c5f33cc2cb0a0d6aaacf893792271cddff0edbd6a40e8319bc113"},
3832
+ {file = "scikit_learn-1.5.2-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:03b6158efa3faaf1feea3faa884c840ebd61b6484167c711548fce208ea09445"},
3833
+ {file = "scikit_learn-1.5.2-cp311-cp311-macosx_12_0_arm64.whl", hash = "sha256:1ff45e26928d3b4eb767a8f14a9a6efbf1cbff7c05d1fb0f95f211a89fd4f5de"},
3834
+ {file = "scikit_learn-1.5.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:f763897fe92d0e903aa4847b0aec0e68cadfff77e8a0687cabd946c89d17e675"},
3835
+ {file = "scikit_learn-1.5.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:f8b0ccd4a902836493e026c03256e8b206656f91fbcc4fde28c57a5b752561f1"},
3836
+ {file = "scikit_learn-1.5.2-cp311-cp311-win_amd64.whl", hash = "sha256:6c16d84a0d45e4894832b3c4d0bf73050939e21b99b01b6fd59cbb0cf39163b6"},
3837
+ {file = "scikit_learn-1.5.2-cp312-cp312-macosx_10_9_x86_64.whl", hash = "sha256:f932a02c3f4956dfb981391ab24bda1dbd90fe3d628e4b42caef3e041c67707a"},
3838
+ {file = "scikit_learn-1.5.2-cp312-cp312-macosx_12_0_arm64.whl", hash = "sha256:3b923d119d65b7bd555c73be5423bf06c0105678ce7e1f558cb4b40b0a5502b1"},
3839
+ {file = "scikit_learn-1.5.2-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:f60021ec1574e56632be2a36b946f8143bf4e5e6af4a06d85281adc22938e0dd"},
3840
+ {file = "scikit_learn-1.5.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:394397841449853c2290a32050382edaec3da89e35b3e03d6cc966aebc6a8ae6"},
3841
+ {file = "scikit_learn-1.5.2-cp312-cp312-win_amd64.whl", hash = "sha256:57cc1786cfd6bd118220a92ede80270132aa353647684efa385a74244a41e3b1"},
3842
+ {file = "scikit_learn-1.5.2-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:e9a702e2de732bbb20d3bad29ebd77fc05a6b427dc49964300340e4c9328b3f5"},
3843
+ {file = "scikit_learn-1.5.2-cp313-cp313-macosx_12_0_arm64.whl", hash = "sha256:b0768ad641981f5d3a198430a1d31c3e044ed2e8a6f22166b4d546a5116d7908"},
3844
+ {file = "scikit_learn-1.5.2-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:178ddd0a5cb0044464fc1bfc4cca5b1833bfc7bb022d70b05db8530da4bb3dd3"},
3845
+ {file = "scikit_learn-1.5.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:f7284ade780084d94505632241bf78c44ab3b6f1e8ccab3d2af58e0e950f9c12"},
3846
+ {file = "scikit_learn-1.5.2-cp313-cp313-win_amd64.whl", hash = "sha256:b7b0f9a0b1040830d38c39b91b3a44e1b643f4b36e36567b80b7c6bd2202a27f"},
3847
+ {file = "scikit_learn-1.5.2-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:757c7d514ddb00ae249832fe87100d9c73c6ea91423802872d9e74970a0e40b9"},
3848
+ {file = "scikit_learn-1.5.2-cp39-cp39-macosx_12_0_arm64.whl", hash = "sha256:52788f48b5d8bca5c0736c175fa6bdaab2ef00a8f536cda698db61bd89c551c1"},
3849
+ {file = "scikit_learn-1.5.2-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:643964678f4b5fbdc95cbf8aec638acc7aa70f5f79ee2cdad1eec3df4ba6ead8"},
3850
+ {file = "scikit_learn-1.5.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:ca64b3089a6d9b9363cd3546f8978229dcbb737aceb2c12144ee3f70f95684b7"},
3851
+ {file = "scikit_learn-1.5.2-cp39-cp39-win_amd64.whl", hash = "sha256:3bed4909ba187aca80580fe2ef370d9180dcf18e621a27c4cf2ef10d279a7efe"},
3852
+ {file = "scikit_learn-1.5.2.tar.gz", hash = "sha256:b4237ed7b3fdd0a4882792e68ef2545d5baa50aca3bb45aa7df468138ad8f94d"},
3853
  ]
3854
 
3855
  [package.dependencies]
3856
  joblib = ">=1.2.0"
3857
  numpy = ">=1.19.5"
3858
  scipy = ">=1.6.0"
3859
+ threadpoolctl = ">=3.1.0"
3860
 
3861
  [package.extras]
3862
+ benchmark = ["matplotlib (>=3.3.4)", "memory_profiler (>=0.57.0)", "pandas (>=1.1.5)"]
3863
+ build = ["cython (>=3.0.10)", "meson-python (>=0.16.0)", "numpy (>=1.19.5)", "scipy (>=1.6.0)"]
3864
+ docs = ["Pillow (>=7.1.2)", "matplotlib (>=3.3.4)", "memory_profiler (>=0.57.0)", "numpydoc (>=1.2.0)", "pandas (>=1.1.5)", "plotly (>=5.14.0)", "polars (>=0.20.30)", "pooch (>=1.6.0)", "pydata-sphinx-theme (>=0.15.3)", "scikit-image (>=0.17.2)", "seaborn (>=0.9.0)", "sphinx (>=7.3.7)", "sphinx-copybutton (>=0.5.2)", "sphinx-design (>=0.5.0)", "sphinx-design (>=0.6.0)", "sphinx-gallery (>=0.16.0)", "sphinx-prompt (>=1.4.0)", "sphinx-remove-toctrees (>=1.0.0.post1)", "sphinxcontrib-sass (>=0.3.4)", "sphinxext-opengraph (>=0.9.1)"]
3865
  examples = ["matplotlib (>=3.3.4)", "pandas (>=1.1.5)", "plotly (>=5.14.0)", "pooch (>=1.6.0)", "scikit-image (>=0.17.2)", "seaborn (>=0.9.0)"]
3866
+ install = ["joblib (>=1.2.0)", "numpy (>=1.19.5)", "scipy (>=1.6.0)", "threadpoolctl (>=3.1.0)"]
3867
+ maintenance = ["conda-lock (==2.5.6)"]
3868
+ tests = ["black (>=24.3.0)", "matplotlib (>=3.3.4)", "mypy (>=1.9)", "numpydoc (>=1.2.0)", "pandas (>=1.1.5)", "polars (>=0.20.30)", "pooch (>=1.6.0)", "pyamg (>=4.0.0)", "pyarrow (>=12.0.0)", "pytest (>=7.1.2)", "pytest-cov (>=2.9.0)", "ruff (>=0.2.1)", "scikit-image (>=0.17.2)"]
3869
 
3870
  [[package]]
3871
  name = "scipy"
3872
+ version = "1.14.1"
3873
  description = "Fundamental algorithms for scientific computing in Python"
3874
  optional = false
3875
+ python-versions = ">=3.10"
3876
+ files = [
3877
+ {file = "scipy-1.14.1-cp310-cp310-macosx_10_13_x86_64.whl", hash = "sha256:b28d2ca4add7ac16ae8bb6632a3c86e4b9e4d52d3e34267f6e1b0c1f8d87e389"},
3878
+ {file = "scipy-1.14.1-cp310-cp310-macosx_12_0_arm64.whl", hash = "sha256:d0d2821003174de06b69e58cef2316a6622b60ee613121199cb2852a873f8cf3"},
3879
+ {file = "scipy-1.14.1-cp310-cp310-macosx_14_0_arm64.whl", hash = "sha256:8bddf15838ba768bb5f5083c1ea012d64c9a444e16192762bd858f1e126196d0"},
3880
+ {file = "scipy-1.14.1-cp310-cp310-macosx_14_0_x86_64.whl", hash = "sha256:97c5dddd5932bd2a1a31c927ba5e1463a53b87ca96b5c9bdf5dfd6096e27efc3"},
3881
+ {file = "scipy-1.14.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:2ff0a7e01e422c15739ecd64432743cf7aae2b03f3084288f399affcefe5222d"},
3882
+ {file = "scipy-1.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:8e32dced201274bf96899e6491d9ba3e9a5f6b336708656466ad0522d8528f69"},
3883
+ {file = "scipy-1.14.1-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:8426251ad1e4ad903a4514712d2fa8fdd5382c978010d1c6f5f37ef286a713ad"},
3884
+ {file = "scipy-1.14.1-cp310-cp310-win_amd64.whl", hash = "sha256:a49f6ed96f83966f576b33a44257d869756df6cf1ef4934f59dd58b25e0327e5"},
3885
+ {file = "scipy-1.14.1-cp311-cp311-macosx_10_13_x86_64.whl", hash = "sha256:2da0469a4ef0ecd3693761acbdc20f2fdeafb69e6819cc081308cc978153c675"},
3886
+ {file = "scipy-1.14.1-cp311-cp311-macosx_12_0_arm64.whl", hash = "sha256:c0ee987efa6737242745f347835da2cc5bb9f1b42996a4d97d5c7ff7928cb6f2"},
3887
+ {file = "scipy-1.14.1-cp311-cp311-macosx_14_0_arm64.whl", hash = "sha256:3a1b111fac6baec1c1d92f27e76511c9e7218f1695d61b59e05e0fe04dc59617"},
3888
+ {file = "scipy-1.14.1-cp311-cp311-macosx_14_0_x86_64.whl", hash = "sha256:8475230e55549ab3f207bff11ebfc91c805dc3463ef62eda3ccf593254524ce8"},
3889
+ {file = "scipy-1.14.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:278266012eb69f4a720827bdd2dc54b2271c97d84255b2faaa8f161a158c3b37"},
3890
+ {file = "scipy-1.14.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:fef8c87f8abfb884dac04e97824b61299880c43f4ce675dd2cbeadd3c9b466d2"},
3891
+ {file = "scipy-1.14.1-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:b05d43735bb2f07d689f56f7b474788a13ed8adc484a85aa65c0fd931cf9ccd2"},
3892
+ {file = "scipy-1.14.1-cp311-cp311-win_amd64.whl", hash = "sha256:716e389b694c4bb564b4fc0c51bc84d381735e0d39d3f26ec1af2556ec6aad94"},
3893
+ {file = "scipy-1.14.1-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:631f07b3734d34aced009aaf6fedfd0eb3498a97e581c3b1e5f14a04164a456d"},
3894
+ {file = "scipy-1.14.1-cp312-cp312-macosx_12_0_arm64.whl", hash = "sha256:af29a935803cc707ab2ed7791c44288a682f9c8107bc00f0eccc4f92c08d6e07"},
3895
+ {file = "scipy-1.14.1-cp312-cp312-macosx_14_0_arm64.whl", hash = "sha256:2843f2d527d9eebec9a43e6b406fb7266f3af25a751aa91d62ff416f54170bc5"},
3896
+ {file = "scipy-1.14.1-cp312-cp312-macosx_14_0_x86_64.whl", hash = "sha256:eb58ca0abd96911932f688528977858681a59d61a7ce908ffd355957f7025cfc"},
3897
+ {file = "scipy-1.14.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:30ac8812c1d2aab7131a79ba62933a2a76f582d5dbbc695192453dae67ad6310"},
3898
+ {file = "scipy-1.14.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:8f9ea80f2e65bdaa0b7627fb00cbeb2daf163caa015e59b7516395fe3bd1e066"},
3899
+ {file = "scipy-1.14.1-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:edaf02b82cd7639db00dbff629995ef185c8df4c3ffa71a5562a595765a06ce1"},
3900
+ {file = "scipy-1.14.1-cp312-cp312-win_amd64.whl", hash = "sha256:2ff38e22128e6c03ff73b6bb0f85f897d2362f8c052e3b8ad00532198fbdae3f"},
3901
+ {file = "scipy-1.14.1-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:1729560c906963fc8389f6aac023739ff3983e727b1a4d87696b7bf108316a79"},
3902
+ {file = "scipy-1.14.1-cp313-cp313-macosx_12_0_arm64.whl", hash = "sha256:4079b90df244709e675cdc8b93bfd8a395d59af40b72e339c2287c91860deb8e"},
3903
+ {file = "scipy-1.14.1-cp313-cp313-macosx_14_0_arm64.whl", hash = "sha256:e0cf28db0f24a38b2a0ca33a85a54852586e43cf6fd876365c86e0657cfe7d73"},
3904
+ {file = "scipy-1.14.1-cp313-cp313-macosx_14_0_x86_64.whl", hash = "sha256:0c2f95de3b04e26f5f3ad5bb05e74ba7f68b837133a4492414b3afd79dfe540e"},
3905
+ {file = "scipy-1.14.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:b99722ea48b7ea25e8e015e8341ae74624f72e5f21fc2abd45f3a93266de4c5d"},
3906
+ {file = "scipy-1.14.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:5149e3fd2d686e42144a093b206aef01932a0059c2a33ddfa67f5f035bdfe13e"},
3907
+ {file = "scipy-1.14.1-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:e4f5a7c49323533f9103d4dacf4e4f07078f360743dec7f7596949149efeec06"},
3908
+ {file = "scipy-1.14.1-cp313-cp313-win_amd64.whl", hash = "sha256:baff393942b550823bfce952bb62270ee17504d02a1801d7fd0719534dfb9c84"},
3909
+ {file = "scipy-1.14.1.tar.gz", hash = "sha256:5a275584e726026a5699459aa72f828a610821006228e841b94275c4a7c08417"},
3910
  ]
3911
 
3912
  [package.dependencies]
3913
+ numpy = ">=1.23.5,<2.3"
3914
 
3915
  [package.extras]
3916
+ dev = ["cython-lint (>=0.12.2)", "doit (>=0.36.0)", "mypy (==1.10.0)", "pycodestyle", "pydevtool", "rich-click", "ruff (>=0.0.292)", "types-psutil", "typing_extensions"]
3917
+ doc = ["jupyterlite-pyodide-kernel", "jupyterlite-sphinx (>=0.13.1)", "jupytext", "matplotlib (>=3.5)", "myst-nb", "numpydoc", "pooch", "pydata-sphinx-theme (>=0.15.2)", "sphinx (>=5.0.0,<=7.3.7)", "sphinx-design (>=0.4.0)"]
3918
+ test = ["Cython", "array-api-strict (>=2.0)", "asv", "gmpy2", "hypothesis (>=6.30)", "meson", "mpmath", "ninja", "pooch", "pytest", "pytest-cov", "pytest-timeout", "pytest-xdist", "scikit-umfpack", "threadpoolctl"]
3919
 
3920
  [[package]]
3921
  name = "send2trash"
 
4048
 
4049
  [[package]]
4050
  name = "surya-ocr"
4051
+ version = "0.6.3"
4052
+ description = "OCR, layout, reading order, and table recognition in 90+ languages"
4053
  optional = false
4054
+ python-versions = ">=3.10"
4055
  files = [
4056
+ {file = "surya_ocr-0.6.3-py3-none-any.whl", hash = "sha256:f4d98e643ed6003a1fed2a758bed391ffc7be908c849d3ab741b05c4d6a714a2"},
4057
+ {file = "surya_ocr-0.6.3.tar.gz", hash = "sha256:cf0e9382352eaf96ff74fe0ca5daff30f96f0897bb481ff418a8ae1a7ce31534"},
4058
  ]
4059
 
4060
  [package.dependencies]
4061
  filetype = ">=1.2.0,<2.0.0"
4062
  ftfy = ">=6.1.3,<7.0.0"
4063
  opencv-python = ">=4.9.0.80,<5.0.0.0"
4064
+ pdftext = ">=0.3.12,<0.4.0"
4065
  pillow = ">=10.2.0,<11.0.0"
4066
  pydantic = ">=2.5.3,<3.0.0"
4067
  pydantic-settings = ">=2.1.0,<3.0.0"
 
4088
  [package.extras]
4089
  dev = ["hypothesis (>=6.70.0)", "pytest (>=7.1.0)"]
4090
 
4091
+ [[package]]
4092
+ name = "tabled-pdf"
4093
+ version = "0.1.0"
4094
+ description = "Detect and recognize tables in PDFs and images."
4095
+ optional = false
4096
+ python-versions = "<4.0,>=3.10"
4097
+ files = [
4098
+ {file = "tabled_pdf-0.1.0-py3-none-any.whl", hash = "sha256:95e3e5863cfbe829c9f233e3e9dc31be8c5f24ffd2367f57e983e710aeee659e"},
4099
+ {file = "tabled_pdf-0.1.0.tar.gz", hash = "sha256:63a2c7d3ae55b3e7e467c2fbad9d78c7c57e31810324fc584cbf322e8e026890"},
4100
+ ]
4101
+
4102
+ [package.dependencies]
4103
+ click = ">=8.1.7,<9.0.0"
4104
+ pydantic = ">=2.9.2,<3.0.0"
4105
+ pydantic-settings = ">=2.5.2,<3.0.0"
4106
+ pypdfium2 = ">=4.30.0,<5.0.0"
4107
+ python-dotenv = ">=1.0.1,<2.0.0"
4108
+ scikit-learn = ">=1.5.2,<2.0.0"
4109
+ surya-ocr = ">=0.6.3,<0.7.0"
4110
+ tabulate = ">=0.9.0,<0.10.0"
4111
+
4112
  [[package]]
4113
  name = "tabulate"
4114
  version = "0.9.0"
 
4972
  idna = ">=2.0"
4973
  multidict = ">=4.0"
4974
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4975
  [metadata]
4976
  lock-version = "2.0"
4977
+ python-versions = "^3.10"
4978
+ content-hash = "887985e53de36c13b8f82a96b1a93fea4ca6762db31bdcf9aa8147572c8a4771"
pyproject.toml CHANGED
@@ -20,25 +20,23 @@ include = [
20
  ]
21
 
22
  [tool.poetry.dependencies]
23
- python = ">=3.9,<3.13,!=3.9.7"
24
- scikit-learn = "^1.3.2,<=1.4.2"
25
  Pillow = "^10.1.0"
26
  pydantic = "^2.4.2"
27
  pydantic-settings = "^2.0.3"
28
- transformers = "^4.36.2"
29
- numpy = "^1.26.1"
30
  python-dotenv = "^1.0.0"
31
- torch = "^2.2.2" # Issue with torch 2.3.0 and vision models - https://github.com/pytorch/pytorch/issues/121834
32
  tqdm = "^4.66.1"
33
  tabulate = "^0.9.0"
34
  ftfy = "^6.1.1"
35
- texify = "^0.1.10"
36
  rapidfuzz = "^3.8.1"
37
- surya-ocr = "^0.5.0"
38
  filetype = "^1.2.0"
39
  regex = "^2024.4.28"
40
- pdftext = "^0.3.10"
41
- grpcio = "^1.63.0"
42
 
43
  [tool.poetry.group.dev.dependencies]
44
  jupyter = "^1.0.0"
 
20
  ]
21
 
22
  [tool.poetry.dependencies]
23
+ python = "^3.10"
 
24
  Pillow = "^10.1.0"
25
  pydantic = "^2.4.2"
26
  pydantic-settings = "^2.0.3"
27
+ transformers = "^4.45.2"
 
28
  python-dotenv = "^1.0.0"
29
+ torch = "^2.4.1"
30
  tqdm = "^4.66.1"
31
  tabulate = "^0.9.0"
32
  ftfy = "^6.1.1"
33
+ texify = "^0.2.0"
34
  rapidfuzz = "^3.8.1"
35
+ surya-ocr = "^0.6.3"
36
  filetype = "^1.2.0"
37
  regex = "^2024.4.28"
38
+ pdftext = "^0.3.13"
39
+ tabled-pdf = "^0.1.0"
40
 
41
  [tool.poetry.group.dev.dependencies]
42
  jupyter = "^1.0.0"