--- title: MinerUapi emoji: 📄 colorFrom: blue colorTo: green sdk: docker app_port: 7860 pinned: false --- # MinerU PDF Converter This Space provides a service for converting PDF files to Markdown and JSON formats using the MinerU PDF extraction tool. ## Features - Web interface for uploading and converting PDF files - RESTful API for programmatic access - Health monitoring endpoint - High-quality PDF extraction with support for tables, formulas, and complex layouts - Output in both Markdown and structured JSON formats - Comprehensive error handling and fallback mechanisms ## API Usage The service exposes several API endpoints for programmatic access: ### 1. PDF Conversion Endpoint ``` POST /api/convert ``` **Request:** - Content-Type: multipart/form-data - Body: form field 'file' containing the PDF file **Response:** ```json { "success": true, "message": "PDF conversion successful", "job_id": "uuid", "base_filename": "filename", "file_info": { "original_filename": "document.pdf", "size_bytes": 42950, "content_type": "application/pdf" }, "markdown": "# Converted markdown content...", "json": { "title": "Document Title", "sections": [...] }, "log": "Processing log...", "files": { "markdown_path": "document.md", "json_path": "document.json" } } ``` ### 2. Health Check Endpoint ``` GET /health ``` **Response:** ```json { "status": "healthy", "version": "1.1.0", "environment": { "python_version": "3.10.12", "platform": "Linux-6.1.58+-x86_64-with-glibc2.35", "processor": "x86_64" }, "configuration": { "upload_folder_exists": true, "output_folder_exists": true, "magic_pdf_installed": true } } ``` ### Client Example A Python client script (`api_client.py`) is included in this repository for easy integration: ```python # Example usage python api_client.py path/to/your/document.pdf --api-url https://marcosremar2-mineruapi.hf.space ``` The client includes features such as: - Automatic health check to verify API status - Retry logic for failed requests - Progress tracking - Comprehensive error handling You can also use curl: ```bash curl -X POST -F "file=@path/to/your/document.pdf" https://marcosremar2-mineruapi.hf.space/api/convert ``` And check health with: ```bash curl https://marcosremar2-mineruapi.hf.space/health ``` ## Web Interface The Space also provides a web interface where you can: - Upload PDF files for conversion - View the generated Markdown and JSON - Download the converted files - View processing logs ## Implementation Details This service uses: - MinerU for high-quality PDF extraction - PyMuPDF as a fallback conversion method - Flask web server for the interface and API - Docker container for deployment on Hugging Face Spaces ## Error Handling The service includes robust error handling: - Automatic fallback to local PDF conversion if MinerU is unavailable - Detailed error messages and logs - API responses include comprehensive details for debugging ## Learn More For more information about MinerU, visit [the MinerU repository](https://github.com/opendatalab/MinerU).