Spaces:
Runtime error
Runtime error
File size: 3,146 Bytes
4e3d16d b286135 4e3d16d b286135 4e3d16d a7cd086 4e3d16d a7cd086 4e3d16d a7cd086 4e3d16d a7cd086 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 | ---
title: MinerUapi
emoji: 📄
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
pinned: false
---
# MinerU PDF Converter
This Space provides a service for converting PDF files to Markdown and JSON formats using the MinerU PDF extraction tool.
## Features
- Web interface for uploading and converting PDF files
- RESTful API for programmatic access
- Health monitoring endpoint
- High-quality PDF extraction with support for tables, formulas, and complex layouts
- Output in both Markdown and structured JSON formats
- Comprehensive error handling and fallback mechanisms
## API Usage
The service exposes several API endpoints for programmatic access:
### 1. PDF Conversion Endpoint
```
POST /api/convert
```
**Request:**
- Content-Type: multipart/form-data
- Body: form field 'file' containing the PDF file
**Response:**
```json
{
"success": true,
"message": "PDF conversion successful",
"job_id": "uuid",
"base_filename": "filename",
"file_info": {
"original_filename": "document.pdf",
"size_bytes": 42950,
"content_type": "application/pdf"
},
"markdown": "# Converted markdown content...",
"json": {
"title": "Document Title",
"sections": [...]
},
"log": "Processing log...",
"files": {
"markdown_path": "document.md",
"json_path": "document.json"
}
}
```
### 2. Health Check Endpoint
```
GET /health
```
**Response:**
```json
{
"status": "healthy",
"version": "1.1.0",
"environment": {
"python_version": "3.10.12",
"platform": "Linux-6.1.58+-x86_64-with-glibc2.35",
"processor": "x86_64"
},
"configuration": {
"upload_folder_exists": true,
"output_folder_exists": true,
"magic_pdf_installed": true
}
}
```
### Client Example
A Python client script (`api_client.py`) is included in this repository for easy integration:
```python
# Example usage
python api_client.py path/to/your/document.pdf --api-url https://marcosremar2-mineruapi.hf.space
```
The client includes features such as:
- Automatic health check to verify API status
- Retry logic for failed requests
- Progress tracking
- Comprehensive error handling
You can also use curl:
```bash
curl -X POST -F "file=@path/to/your/document.pdf" https://marcosremar2-mineruapi.hf.space/api/convert
```
And check health with:
```bash
curl https://marcosremar2-mineruapi.hf.space/health
```
## Web Interface
The Space also provides a web interface where you can:
- Upload PDF files for conversion
- View the generated Markdown and JSON
- Download the converted files
- View processing logs
## Implementation Details
This service uses:
- MinerU for high-quality PDF extraction
- PyMuPDF as a fallback conversion method
- Flask web server for the interface and API
- Docker container for deployment on Hugging Face Spaces
## Error Handling
The service includes robust error handling:
- Automatic fallback to local PDF conversion if MinerU is unavailable
- Detailed error messages and logs
- API responses include comprehensive details for debugging
## Learn More
For more information about MinerU, visit [the MinerU repository](https://github.com/opendatalab/MinerU).
|