Spaces:
Sleeping
Sleeping
| # DocStrange Web Interface | |
| A beautiful, modern web interface for the DocStrange document extraction library, inspired by the data-extraction-apis project design. | |
| ## Features | |
| - **Modern UI**: Clean, responsive design with drag-and-drop file upload | |
| - **Multiple Formats**: Support for PDF, Word, Excel, PowerPoint, images, and more | |
| - **Output Options**: Convert to Markdown, HTML, JSON, CSV, or Flat JSON | |
| - **Real-time Processing**: Live extraction with progress indicators | |
| - **Download Results**: Save extracted content in various formats | |
| - **Mobile Friendly**: Responsive design that works on all devices | |
| ## Quick Start | |
| ### 1. Install Dependencies | |
| ```bash | |
| pip install docstrange[web] | |
| ``` | |
| ### 2. Start the Web Interface | |
| ```bash | |
| docstrange web | |
| ``` | |
| ### 3. Open Your Browser | |
| Navigate to: http://localhost:8000 | |
| ## Usage | |
| ### File Upload | |
| 1. **Drag & Drop**: Simply drag your file onto the upload area | |
| 2. **Click to Browse**: Click the upload area to select a file from your computer | |
| 3. **Supported Formats**: PDF, Word (.docx, .doc), Excel (.xlsx, .xls), PowerPoint (.pptx, .ppt), HTML, CSV, Text, Images (PNG, JPG, TIFF, BMP) | |
| ### Output Format Selection | |
| Choose from multiple output formats: | |
| - **Markdown**: Clean, structured markdown text | |
| - **HTML**: Formatted HTML with styling | |
| - **JSON**: Structured JSON data | |
| - **CSV**: Table data in CSV format | |
| - **Flat JSON**: Simplified JSON structure | |
| ### Results View | |
| After processing, you can: | |
| - **Preview**: View formatted content in the preview tab | |
| - **Raw Output**: See the raw extracted text | |
| - **Download**: Save results as text or JSON files | |
| ## API Endpoints | |
| The web interface also provides REST API endpoints: | |
| ### Health Check | |
| ``` | |
| GET /api/health | |
| ``` | |
| ### Get Supported Formats | |
| ``` | |
| GET /api/supported-formats | |
| ``` | |
| ### Extract Document | |
| ``` | |
| POST /api/extract | |
| Content-Type: multipart/form-data | |
| Parameters: | |
| - file: The document file to extract | |
| - output_format: markdown, html, json, csv, flat-json | |
| ``` | |
| ## Configuration | |
| ### Environment Variables | |
| - `FLASK_ENV`: Set to `development` for debug mode | |
| - `MAX_CONTENT_LENGTH`: Maximum file size (default: 100MB) | |
| ### Customization | |
| The web interface uses a modular design system: | |
| - **CSS Variables**: Easy theming via CSS custom properties | |
| - **Responsive Design**: Mobile-first approach | |
| - **Component-based**: Reusable UI components | |
| ## Development | |
| ### Running in Development Mode | |
| ```bash | |
| # Install development dependencies | |
| pip install -e . | |
| # Start with debug mode | |
| python -m docstrange.web_app | |
| ``` | |
| ### File Structure | |
| ``` | |
| docstrange/ | |
| βββ web_app.py # Flask application | |
| βββ templates/ | |
| β βββ index.html # Main HTML template | |
| βββ static/ | |
| βββ styles.css # Design system CSS | |
| βββ script.js # Frontend JavaScript | |
| ``` | |
| ### Testing | |
| ```bash | |
| # Run the test script | |
| python test_web_interface.py | |
| ``` | |
| ## Troubleshooting | |
| ### Common Issues | |
| 1. **Port Already in Use** | |
| ```bash | |
| # Use a different port | |
| docstrange web --port 8080 | |
| ``` | |
| 2. **File Upload Fails** | |
| - Check file size (max 100MB) | |
| - Verify file format is supported | |
| - Ensure proper file permissions | |
| 3. **Extraction Errors** | |
| - Check console logs for detailed error messages | |
| - Verify document is not corrupted | |
| - Try different output formats | |
| ### Logs | |
| The web interface logs to the console. Check for: | |
| - File upload events | |
| - Processing status | |
| - Error messages | |
| - API request details | |
| ## Contributing | |
| To contribute to the web interface: | |
| 1. Fork the repository | |
| 2. Create a feature branch | |
| 3. Make your changes | |
| 4. Test thoroughly | |
| 5. Submit a pull request | |
| ## License | |
| This web interface is part of the DocStrange project and is licensed under the MIT License. |