Update README.md
Browse files
README.md
CHANGED
|
@@ -40,20 +40,15 @@ Clone the service:
|
|
| 40 |
|
| 41 |
Start the service:
|
| 42 |
|
| 43 |
-
# With GPU support:
|
| 44 |
make start
|
| 45 |
|
| 46 |
-
# Without GPU support [if you do not have a GPU on your system]
|
| 47 |
-
make start_no_gpu
|
| 48 |
-
|
| 49 |
-
|
| 50 |
Get the segments of a PDF:
|
| 51 |
|
| 52 |
# With visual models
|
| 53 |
curl -X POST -F 'file=@/PATH/TO/PDF/pdf_name.pdf' localhost:5060
|
| 54 |
|
| 55 |
# With non-visual models [with the models in this model card]
|
| 56 |
-
curl -X POST -F 'file=@/PATH/TO/PDF/pdf_name.pdf' localhost:5060
|
| 57 |
|
| 58 |
|
| 59 |
To stop the server:
|
|
@@ -104,11 +99,11 @@ Also for training the LightGBM models, we again used this dataset. There are 11
|
|
| 104 |
1: "Caption"
|
| 105 |
2: "Footnote"
|
| 106 |
3: "Formula"
|
| 107 |
-
4: "
|
| 108 |
-
5: "
|
| 109 |
-
6: "
|
| 110 |
7: "Picture"
|
| 111 |
-
8: "
|
| 112 |
9: "Table"
|
| 113 |
10: "Text"
|
| 114 |
11: "Title"
|
|
@@ -126,7 +121,7 @@ As we mentioned at the [Quick Start](#quick-start), you can use the service simp
|
|
| 126 |
This command will run the code on visual model. So you should be prepared that it will use lots of resources. But if you
|
| 127 |
want to use the not visual models, which are the LightGBM models, you can use this command:
|
| 128 |
|
| 129 |
-
curl -X POST -F 'file=@/PATH/TO/PDF/pdf_name.pdf' localhost:5060
|
| 130 |
|
| 131 |
The shape of the response will be the same in both of these commands.
|
| 132 |
|
|
@@ -139,6 +134,8 @@ When the process is done, the output will include a list of SegmentBox elements
|
|
| 139 |
"width": Width of the segment
|
| 140 |
"height": Height of the segment
|
| 141 |
"page_number": Page number which the segment belongs to
|
|
|
|
|
|
|
| 142 |
"text": Text inside the segment
|
| 143 |
"type": Type of the segment (one of the categories mentioned above)
|
| 144 |
}
|
|
|
|
| 40 |
|
| 41 |
Start the service:
|
| 42 |
|
|
|
|
| 43 |
make start
|
| 44 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
Get the segments of a PDF:
|
| 46 |
|
| 47 |
# With visual models
|
| 48 |
curl -X POST -F 'file=@/PATH/TO/PDF/pdf_name.pdf' localhost:5060
|
| 49 |
|
| 50 |
# With non-visual models [with the models in this model card]
|
| 51 |
+
curl -X POST -F 'file=@/PATH/TO/PDF/pdf_name.pdf' -F "fast=true" localhost:5060
|
| 52 |
|
| 53 |
|
| 54 |
To stop the server:
|
|
|
|
| 99 |
1: "Caption"
|
| 100 |
2: "Footnote"
|
| 101 |
3: "Formula"
|
| 102 |
+
4: "List item"
|
| 103 |
+
5: "Page footer"
|
| 104 |
+
6: "Page header"
|
| 105 |
7: "Picture"
|
| 106 |
+
8: "Section header"
|
| 107 |
9: "Table"
|
| 108 |
10: "Text"
|
| 109 |
11: "Title"
|
|
|
|
| 121 |
This command will run the code on visual model. So you should be prepared that it will use lots of resources. But if you
|
| 122 |
want to use the not visual models, which are the LightGBM models, you can use this command:
|
| 123 |
|
| 124 |
+
curl -X POST -F 'file=@/PATH/TO/PDF/pdf_name.pdf' -F "fast=true" localhost:5060
|
| 125 |
|
| 126 |
The shape of the response will be the same in both of these commands.
|
| 127 |
|
|
|
|
| 134 |
"width": Width of the segment
|
| 135 |
"height": Height of the segment
|
| 136 |
"page_number": Page number which the segment belongs to
|
| 137 |
+
"page_width": Width of the page which the segment belongs to
|
| 138 |
+
"page_height": Width of the page which the segment belongs to
|
| 139 |
"text": Text inside the segment
|
| 140 |
"type": Type of the segment (one of the categories mentioned above)
|
| 141 |
}
|