Usage Examples
This directory contains examples of running marker in different contexts.
Usage with Modal
We have a self-contained example that shows how you can quickly use Modal to deploy marker by provisioning a container with a GPU, and expose that with an API so you can submit PDFs for conversion into Markdown, HTML, or JSON.
It's a limited example that you can extend into different use cases.
Pre-requisites
Make sure you have the modal client installed by following their instructions here.
Modal's Starter Plan includes $30 of free compute each month. Modal is serverless, so you only pay for resources when you are using them.
Running the example
Once modal is configured, you can deploy it to your workspace by running:
modal deploy marker_modal_deployment.py
Notes:
markerhas a few models it uses. By default, the endpoint will check if these models are loaded and download them if not (first request will be slow). You can avoid this by running
modal run marker_modal_deployment.py::download_models
Which will create a Modal Volume to store them for re-use.
Once the deploy is finished, you can:
- Test a file upload locally through your CLI using an
invoke_conversioncommand we expose through Modal'slocal_entrypoint - Get the URL of your endpoint and make a request through a client of your choice.
Test from your CLI with invoke_conversion
If your endpoint is live, simply run this command:
$ modal run marker_modal_deployment.py::invoke_conversion --pdf-file <PDF_FILE_PATH> --output-format markdown
And it'll automatically detect the URL of your new endpoint using .get_web_url(), make sure it's healthy, submit your file, and store its output on your machine (in the same directory).
Making a request using your own client
If you want to make requests elsewhere e.g. with cURL or a client like Insomnia, you'll need to get the URL.
When your modal deploy command from earlier finishes, it'll include your endpoint URL at the end. For example:
$ modal deploy marker_modal_deployment.py
...
β Created objects.
βββ π¨ Created mount /marker/examples/marker_modal_deployment.py
βββ π¨ Created function download_models.
βββ π¨ Created function MarkerModalDemoService.*.
βββ π¨ Created web endpoint for MarkerModalDemoService.fastapi_app => <YOUR_ENDPOINT_URL>
β App deployed in 149.877s! π
If you accidentally close your terminal session, you can also always go into Modal's dashboard and:
- Find the app (default name:
datalab-marker-modal-demo) - Click on
MarkerModalDemoService - Find your endpoint URL
Once you have your URL, make a request to {YOUR_ENDPOINT_URL}/convert like this (you can also use Insomnia, etc.):
curl --request POST \
--url {BASE_URL}/convert \
--header 'Content-Type: multipart/form-data' \
--form file=@/Users/cooldev/sample.pdf \
--form output_format=html
You should get a response like this
{
"success": true,
"filename": "sample.pdf",
"output_format": "html",
"json": null,
"html": "<YOUR_RESPONSE_CONTENT>",
"markdown": null,
"images": {},
"metadata": {... page level metadata ...},
"page_count": 2
}
Modal makes deploying and scaling models and inference workloads much easier.
If you're interested in Datalab's managed API or on-prem document intelligence solution, check out our platform here.