HURIDOCS
/

pdf-segmentation

Model card Files Files and versions

gabriel-p commited on Nov 10, 2022

Commit

91c98bb

·

1 Parent(s): c9e8865

Update README.md

Files changed (1) hide show

README.md +34 -0

README.md CHANGED Viewed

@@ -1,3 +1,37 @@
 ---
 license: openrail
 ---

 ---
 license: openrail
 ---
+<h3 align="center">PDF Paragraphs Extraction</h3>
+<p align="center">A model for extracting paragraphs from PDFs</p>
+This model uses features from the PDF to extract the text and paragraphs from it. It can be used as a service.
+The paragraphs contain the page number, the position in the page, the size, and the text.
+## Quick Start
+Download the service that uses the model:
+    git clone https://github.com/huridocs/pdf_paragraphs_extraction.git
+    cd pdf_paragraphs_extraction
+Start the service:
+    ./run start
+Get the paragraphs from a PDF:
+    curl -X GET -F 'file=@/PATH/TO/PDF/pdf_name.pdf' localhost:5051
+To stop the server:
+    ./run stop
+## Performance
+Accuracy: 93.9%
+Speed: tbd