| ================================== |
| Installation |
| ================================== |
|
|
| In this section, we will demonstrate how to install PDF-Extract-Kit. |
|
|
| Best Practices |
| ============== |
|
|
| We recommend users follow our best practices for installing PDF-Extract-Kit. It is recommended to use a Python 3.10 conda virtual environment for the installation. |
|
|
| **Step 1.** Create a Python 3.10 virtual environment using conda. |
|
|
| .. code-block:: console |
|
|
| $ conda create -n pdf-extract-kit-1.0 python=3.10 -y |
| $ conda activate pdf-extract-kit-1.0 |
|
|
| **Step 2.** Install the dependencies for PDF-Extract-Kit. |
|
|
| .. code-block:: console |
|
|
| $ # For GPU devices |
| $ pip install -r requirements.txt |
| $ # For CPU-only devices |
| $ pip install -r requirements-cpu.txt |
|
|
| .. note:: |
|
|
| For the convenience of user environment configuration, requirements.txt only includes the environment needed for the current best models, which currently include: |
| |
| - Layout Detection: YOLO series (YOLOv10, DocLayout-YOLO) |
| - Formula Detection: YOLO series (YOLOv8) |
| - Formula Recognition: UniMERNet |
| - OCR: PaddleOCR |
|
|
| For other models, such as LayoutLMv3, additional environment setup is required. For details, see \ :ref:`Layout Detection Algorithms <algorithm_layout_detection>`. |