Spaces:
Runtime error
Runtime error
| title: MinerUapi | |
| emoji: ๐ | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: docker | |
| app_port: 7860 | |
| pinned: false | |
| # MinerU PDF Converter | |
| This Space provides a service for converting PDF files to Markdown and JSON formats using the MinerU PDF extraction tool. | |
| ## Features | |
| - Web interface for uploading and converting PDF files | |
| - RESTful API for programmatic access | |
| - Health monitoring endpoint | |
| - High-quality PDF extraction with support for tables, formulas, and complex layouts | |
| - Output in both Markdown and structured JSON formats | |
| - Comprehensive error handling and fallback mechanisms | |
| ## API Usage | |
| The service exposes several API endpoints for programmatic access: | |
| ### 1. PDF Conversion Endpoint | |
| ``` | |
| POST /api/convert | |
| ``` | |
| **Request:** | |
| - Content-Type: multipart/form-data | |
| - Body: form field 'file' containing the PDF file | |
| **Response:** | |
| ```json | |
| { | |
| "success": true, | |
| "message": "PDF conversion successful", | |
| "job_id": "uuid", | |
| "base_filename": "filename", | |
| "file_info": { | |
| "original_filename": "document.pdf", | |
| "size_bytes": 42950, | |
| "content_type": "application/pdf" | |
| }, | |
| "markdown": "# Converted markdown content...", | |
| "json": { | |
| "title": "Document Title", | |
| "sections": [...] | |
| }, | |
| "log": "Processing log...", | |
| "files": { | |
| "markdown_path": "document.md", | |
| "json_path": "document.json" | |
| } | |
| } | |
| ``` | |
| ### 2. Health Check Endpoint | |
| ``` | |
| GET /health | |
| ``` | |
| **Response:** | |
| ```json | |
| { | |
| "status": "healthy", | |
| "version": "1.1.0", | |
| "environment": { | |
| "python_version": "3.10.12", | |
| "platform": "Linux-6.1.58+-x86_64-with-glibc2.35", | |
| "processor": "x86_64" | |
| }, | |
| "configuration": { | |
| "upload_folder_exists": true, | |
| "output_folder_exists": true, | |
| "magic_pdf_installed": true | |
| } | |
| } | |
| ``` | |
| ### Client Example | |
| A Python client script (`api_client.py`) is included in this repository for easy integration: | |
| ```python | |
| # Example usage | |
| python api_client.py path/to/your/document.pdf --api-url https://marcosremar2-mineruapi.hf.space | |
| ``` | |
| The client includes features such as: | |
| - Automatic health check to verify API status | |
| - Retry logic for failed requests | |
| - Progress tracking | |
| - Comprehensive error handling | |
| You can also use curl: | |
| ```bash | |
| curl -X POST -F "file=@path/to/your/document.pdf" https://marcosremar2-mineruapi.hf.space/api/convert | |
| ``` | |
| And check health with: | |
| ```bash | |
| curl https://marcosremar2-mineruapi.hf.space/health | |
| ``` | |
| ## Web Interface | |
| The Space also provides a web interface where you can: | |
| - Upload PDF files for conversion | |
| - View the generated Markdown and JSON | |
| - Download the converted files | |
| - View processing logs | |
| ## Implementation Details | |
| This service uses: | |
| - MinerU for high-quality PDF extraction | |
| - PyMuPDF as a fallback conversion method | |
| - Flask web server for the interface and API | |
| - Docker container for deployment on Hugging Face Spaces | |
| ## Error Handling | |
| The service includes robust error handling: | |
| - Automatic fallback to local PDF conversion if MinerU is unavailable | |
| - Detailed error messages and logs | |
| - API responses include comprehensive details for debugging | |
| ## Learn More | |
| For more information about MinerU, visit [the MinerU repository](https://github.com/opendatalab/MinerU). | |