DocStrange Web Interface
A beautiful, modern web interface for the DocStrange document extraction library, inspired by the data-extraction-apis project design.
Features
- Modern UI: Clean, responsive design with drag-and-drop file upload
- Multiple Formats: Support for PDF, Word, Excel, PowerPoint, images, and more
- Output Options: Convert to Markdown, HTML, JSON, CSV, or Flat JSON
- Real-time Processing: Live extraction with progress indicators
- Download Results: Save extracted content in various formats
- Mobile Friendly: Responsive design that works on all devices
Quick Start
1. Install Dependencies
pip install docstrange[web]
2. Start the Web Interface
docstrange web
3. Open Your Browser
Navigate to: http://localhost:8000
Usage
File Upload
- Drag & Drop: Simply drag your file onto the upload area
- Click to Browse: Click the upload area to select a file from your computer
- Supported Formats: PDF, Word (.docx, .doc), Excel (.xlsx, .xls), PowerPoint (.pptx, .ppt), HTML, CSV, Text, Images (PNG, JPG, TIFF, BMP)
Output Format Selection
Choose from multiple output formats:
- Markdown: Clean, structured markdown text
- HTML: Formatted HTML with styling
- JSON: Structured JSON data
- CSV: Table data in CSV format
- Flat JSON: Simplified JSON structure
Results View
After processing, you can:
- Preview: View formatted content in the preview tab
- Raw Output: See the raw extracted text
- Download: Save results as text or JSON files
API Endpoints
The web interface also provides REST API endpoints:
Health Check
GET /api/health
Get Supported Formats
GET /api/supported-formats
Extract Document
POST /api/extract
Content-Type: multipart/form-data
Parameters:
- file: The document file to extract
- output_format: markdown, html, json, csv, flat-json
Configuration
Environment Variables
FLASK_ENV: Set todevelopmentfor debug modeMAX_CONTENT_LENGTH: Maximum file size (default: 100MB)
Customization
The web interface uses a modular design system:
- CSS Variables: Easy theming via CSS custom properties
- Responsive Design: Mobile-first approach
- Component-based: Reusable UI components
Development
Running in Development Mode
# Install development dependencies
pip install -e .
# Start with debug mode
python -m docstrange.web_app
File Structure
docstrange/
βββ web_app.py # Flask application
βββ templates/
β βββ index.html # Main HTML template
βββ static/
βββ styles.css # Design system CSS
βββ script.js # Frontend JavaScript
Testing
# Run the test script
python test_web_interface.py
Troubleshooting
Common Issues
Port Already in Use
# Use a different port docstrange web --port 8080File Upload Fails
- Check file size (max 100MB)
- Verify file format is supported
- Ensure proper file permissions
Extraction Errors
- Check console logs for detailed error messages
- Verify document is not corrupted
- Try different output formats
Logs
The web interface logs to the console. Check for:
- File upload events
- Processing status
- Error messages
- API request details
Contributing
To contribute to the web interface:
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
License
This web interface is part of the DocStrange project and is licensed under the MIT License.