docling-processor / docstrange /WEB_INTERFACE.md
arjunbhargav212's picture
Upload 63 files
5b14aa2 verified

DocStrange Web Interface

A beautiful, modern web interface for the DocStrange document extraction library, inspired by the data-extraction-apis project design.

Features

  • Modern UI: Clean, responsive design with drag-and-drop file upload
  • Multiple Formats: Support for PDF, Word, Excel, PowerPoint, images, and more
  • Output Options: Convert to Markdown, HTML, JSON, CSV, or Flat JSON
  • Real-time Processing: Live extraction with progress indicators
  • Download Results: Save extracted content in various formats
  • Mobile Friendly: Responsive design that works on all devices

Quick Start

1. Install Dependencies

pip install docstrange[web]

2. Start the Web Interface

docstrange web

3. Open Your Browser

Navigate to: http://localhost:8000

Usage

File Upload

  1. Drag & Drop: Simply drag your file onto the upload area
  2. Click to Browse: Click the upload area to select a file from your computer
  3. Supported Formats: PDF, Word (.docx, .doc), Excel (.xlsx, .xls), PowerPoint (.pptx, .ppt), HTML, CSV, Text, Images (PNG, JPG, TIFF, BMP)

Output Format Selection

Choose from multiple output formats:

  • Markdown: Clean, structured markdown text
  • HTML: Formatted HTML with styling
  • JSON: Structured JSON data
  • CSV: Table data in CSV format
  • Flat JSON: Simplified JSON structure

Results View

After processing, you can:

  • Preview: View formatted content in the preview tab
  • Raw Output: See the raw extracted text
  • Download: Save results as text or JSON files

API Endpoints

The web interface also provides REST API endpoints:

Health Check

GET /api/health

Get Supported Formats

GET /api/supported-formats

Extract Document

POST /api/extract
Content-Type: multipart/form-data

Parameters:
- file: The document file to extract
- output_format: markdown, html, json, csv, flat-json

Configuration

Environment Variables

  • FLASK_ENV: Set to development for debug mode
  • MAX_CONTENT_LENGTH: Maximum file size (default: 100MB)

Customization

The web interface uses a modular design system:

  • CSS Variables: Easy theming via CSS custom properties
  • Responsive Design: Mobile-first approach
  • Component-based: Reusable UI components

Development

Running in Development Mode

# Install development dependencies
pip install -e .

# Start with debug mode
python -m docstrange.web_app

File Structure

docstrange/
β”œβ”€β”€ web_app.py          # Flask application
β”œβ”€β”€ templates/
β”‚   └── index.html      # Main HTML template
└── static/
    β”œβ”€β”€ styles.css      # Design system CSS
    └── script.js       # Frontend JavaScript

Testing

# Run the test script
python test_web_interface.py

Troubleshooting

Common Issues

  1. Port Already in Use

    # Use a different port
    docstrange web --port 8080
    
  2. File Upload Fails

    • Check file size (max 100MB)
    • Verify file format is supported
    • Ensure proper file permissions
  3. Extraction Errors

    • Check console logs for detailed error messages
    • Verify document is not corrupted
    • Try different output formats

Logs

The web interface logs to the console. Check for:

  • File upload events
  • Processing status
  • Error messages
  • API request details

Contributing

To contribute to the web interface:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test thoroughly
  5. Submit a pull request

License

This web interface is part of the DocStrange project and is licensed under the MIT License.