HURIDOCS
/

pdf-document-layout-analysis

Model card Files Files and versions

ali6parmak commited on Jul 21, 2024

Commit

acc26ee

·

verified ·

1 Parent(s): 7f89ccb

Update README.md

Files changed (1) hide show

README.md +8 -11

README.md CHANGED Viewed

@@ -40,20 +40,15 @@ Clone the service:
 Start the service:
-    # With GPU support:
     make start
-    # Without GPU support [if you do not have a GPU on your system]
-    make start_no_gpu
 Get the segments of a PDF:
     # With visual models
     curl -X POST -F 'file=@/PATH/TO/PDF/pdf_name.pdf' localhost:5060
     # With non-visual models [with the models in this model card]
-    curl -X POST -F 'file=@/PATH/TO/PDF/pdf_name.pdf' localhost:5060/fast
 To stop the server:
@@ -104,11 +99,11 @@ Also for training the LightGBM models, we again used this dataset. There are 11
        1: "Caption"
        2: "Footnote"
        3: "Formula"
-       4: "ListItem"
-       5: "PageFooter"
-       6: "PageHeader"
        7: "Picture"
-       8: "SectionHeader"
        9: "Table"
        10: "Text"
        11: "Title"
@@ -126,7 +121,7 @@ As we mentioned at the [Quick Start](#quick-start), you can use the service simp
 This command will run the code on visual model. So you should be prepared that it will use lots of resources. But if you
 want to use the not visual models, which are the LightGBM models, you can use this command:
-    curl -X POST -F 'file=@/PATH/TO/PDF/pdf_name.pdf' localhost:5060/fast
 The shape of the response will be the same in both of these commands.
@@ -139,6 +134,8 @@ When the process is done, the output will include a list of SegmentBox elements
             "width": Width of the segment
             "height": Height of the segment
             "page_number": Page number which the segment belongs to
             "text": Text inside the segment
             "type": Type of the segment (one of the categories mentioned above)
         }

 Start the service:
     make start
 Get the segments of a PDF:
     # With visual models
     curl -X POST -F 'file=@/PATH/TO/PDF/pdf_name.pdf' localhost:5060
     # With non-visual models [with the models in this model card]
+    curl -X POST -F 'file=@/PATH/TO/PDF/pdf_name.pdf' -F "fast=true" localhost:5060
 To stop the server:
        1: "Caption"
        2: "Footnote"
        3: "Formula"
+       4: "List item"
+       5: "Page footer"
+       6: "Page header"
        7: "Picture"
+       8: "Section header"
        9: "Table"
        10: "Text"
        11: "Title"
 This command will run the code on visual model. So you should be prepared that it will use lots of resources. But if you
 want to use the not visual models, which are the LightGBM models, you can use this command:
+    curl -X POST -F 'file=@/PATH/TO/PDF/pdf_name.pdf' -F "fast=true" localhost:5060
 The shape of the response will be the same in both of these commands.
             "width": Width of the segment
             "height": Height of the segment
             "page_number": Page number which the segment belongs to
+            "page_width": Width of the page which the segment belongs to
+            "page_height": Width of the page which the segment belongs to
             "text": Text inside the segment
             "type": Type of the segment (one of the categories mentioned above)
         }