Spaces:
Runtime error
Runtime error
metadata
title: MinerUapi
emoji: 📄
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
pinned: false
MinerU PDF Converter
This Space provides a service for converting PDF files to Markdown and JSON formats using the MinerU PDF extraction tool.
Features
- Web interface for uploading and converting PDF files
- RESTful API for programmatic access
- Health monitoring endpoint
- High-quality PDF extraction with support for tables, formulas, and complex layouts
- Output in both Markdown and structured JSON formats
- Comprehensive error handling and fallback mechanisms
API Usage
The service exposes several API endpoints for programmatic access:
1. PDF Conversion Endpoint
POST /api/convert
Request:
- Content-Type: multipart/form-data
- Body: form field 'file' containing the PDF file
Response:
{
"success": true,
"message": "PDF conversion successful",
"job_id": "uuid",
"base_filename": "filename",
"file_info": {
"original_filename": "document.pdf",
"size_bytes": 42950,
"content_type": "application/pdf"
},
"markdown": "# Converted markdown content...",
"json": {
"title": "Document Title",
"sections": [...]
},
"log": "Processing log...",
"files": {
"markdown_path": "document.md",
"json_path": "document.json"
}
}
2. Health Check Endpoint
GET /health
Response:
{
"status": "healthy",
"version": "1.1.0",
"environment": {
"python_version": "3.10.12",
"platform": "Linux-6.1.58+-x86_64-with-glibc2.35",
"processor": "x86_64"
},
"configuration": {
"upload_folder_exists": true,
"output_folder_exists": true,
"magic_pdf_installed": true
}
}
Client Example
A Python client script (api_client.py) is included in this repository for easy integration:
# Example usage
python api_client.py path/to/your/document.pdf --api-url https://marcosremar2-mineruapi.hf.space
The client includes features such as:
- Automatic health check to verify API status
- Retry logic for failed requests
- Progress tracking
- Comprehensive error handling
You can also use curl:
curl -X POST -F "file=@path/to/your/document.pdf" https://marcosremar2-mineruapi.hf.space/api/convert
And check health with:
curl https://marcosremar2-mineruapi.hf.space/health
Web Interface
The Space also provides a web interface where you can:
- Upload PDF files for conversion
- View the generated Markdown and JSON
- Download the converted files
- View processing logs
Implementation Details
This service uses:
- MinerU for high-quality PDF extraction
- PyMuPDF as a fallback conversion method
- Flask web server for the interface and API
- Docker container for deployment on Hugging Face Spaces
Error Handling
The service includes robust error handling:
- Automatic fallback to local PDF conversion if MinerU is unavailable
- Detailed error messages and logs
- API responses include comprehensive details for debugging
Learn More
For more information about MinerU, visit the MinerU repository.