MinerUapi / README.md
marcosremar2's picture
Fix README merge conflict
a7cd086
---
title: MinerUapi
emoji: ๐Ÿ“„
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
pinned: false
---
# MinerU PDF Converter
This Space provides a service for converting PDF files to Markdown and JSON formats using the MinerU PDF extraction tool.
## Features
- Web interface for uploading and converting PDF files
- RESTful API for programmatic access
- Health monitoring endpoint
- High-quality PDF extraction with support for tables, formulas, and complex layouts
- Output in both Markdown and structured JSON formats
- Comprehensive error handling and fallback mechanisms
## API Usage
The service exposes several API endpoints for programmatic access:
### 1. PDF Conversion Endpoint
```
POST /api/convert
```
**Request:**
- Content-Type: multipart/form-data
- Body: form field 'file' containing the PDF file
**Response:**
```json
{
"success": true,
"message": "PDF conversion successful",
"job_id": "uuid",
"base_filename": "filename",
"file_info": {
"original_filename": "document.pdf",
"size_bytes": 42950,
"content_type": "application/pdf"
},
"markdown": "# Converted markdown content...",
"json": {
"title": "Document Title",
"sections": [...]
},
"log": "Processing log...",
"files": {
"markdown_path": "document.md",
"json_path": "document.json"
}
}
```
### 2. Health Check Endpoint
```
GET /health
```
**Response:**
```json
{
"status": "healthy",
"version": "1.1.0",
"environment": {
"python_version": "3.10.12",
"platform": "Linux-6.1.58+-x86_64-with-glibc2.35",
"processor": "x86_64"
},
"configuration": {
"upload_folder_exists": true,
"output_folder_exists": true,
"magic_pdf_installed": true
}
}
```
### Client Example
A Python client script (`api_client.py`) is included in this repository for easy integration:
```python
# Example usage
python api_client.py path/to/your/document.pdf --api-url https://marcosremar2-mineruapi.hf.space
```
The client includes features such as:
- Automatic health check to verify API status
- Retry logic for failed requests
- Progress tracking
- Comprehensive error handling
You can also use curl:
```bash
curl -X POST -F "file=@path/to/your/document.pdf" https://marcosremar2-mineruapi.hf.space/api/convert
```
And check health with:
```bash
curl https://marcosremar2-mineruapi.hf.space/health
```
## Web Interface
The Space also provides a web interface where you can:
- Upload PDF files for conversion
- View the generated Markdown and JSON
- Download the converted files
- View processing logs
## Implementation Details
This service uses:
- MinerU for high-quality PDF extraction
- PyMuPDF as a fallback conversion method
- Flask web server for the interface and API
- Docker container for deployment on Hugging Face Spaces
## Error Handling
The service includes robust error handling:
- Automatic fallback to local PDF conversion if MinerU is unavailable
- Detailed error messages and logs
- API responses include comprehensive details for debugging
## Learn More
For more information about MinerU, visit [the MinerU repository](https://github.com/opendatalab/MinerU).