File size: 3,709 Bytes
5b14aa2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 | # DocStrange Web Interface
A beautiful, modern web interface for the DocStrange document extraction library, inspired by the data-extraction-apis project design.
## Features
- **Modern UI**: Clean, responsive design with drag-and-drop file upload
- **Multiple Formats**: Support for PDF, Word, Excel, PowerPoint, images, and more
- **Output Options**: Convert to Markdown, HTML, JSON, CSV, or Flat JSON
- **Real-time Processing**: Live extraction with progress indicators
- **Download Results**: Save extracted content in various formats
- **Mobile Friendly**: Responsive design that works on all devices
## Quick Start
### 1. Install Dependencies
```bash
pip install docstrange[web]
```
### 2. Start the Web Interface
```bash
docstrange web
```
### 3. Open Your Browser
Navigate to: http://localhost:8000
## Usage
### File Upload
1. **Drag & Drop**: Simply drag your file onto the upload area
2. **Click to Browse**: Click the upload area to select a file from your computer
3. **Supported Formats**: PDF, Word (.docx, .doc), Excel (.xlsx, .xls), PowerPoint (.pptx, .ppt), HTML, CSV, Text, Images (PNG, JPG, TIFF, BMP)
### Output Format Selection
Choose from multiple output formats:
- **Markdown**: Clean, structured markdown text
- **HTML**: Formatted HTML with styling
- **JSON**: Structured JSON data
- **CSV**: Table data in CSV format
- **Flat JSON**: Simplified JSON structure
### Results View
After processing, you can:
- **Preview**: View formatted content in the preview tab
- **Raw Output**: See the raw extracted text
- **Download**: Save results as text or JSON files
## API Endpoints
The web interface also provides REST API endpoints:
### Health Check
```
GET /api/health
```
### Get Supported Formats
```
GET /api/supported-formats
```
### Extract Document
```
POST /api/extract
Content-Type: multipart/form-data
Parameters:
- file: The document file to extract
- output_format: markdown, html, json, csv, flat-json
```
## Configuration
### Environment Variables
- `FLASK_ENV`: Set to `development` for debug mode
- `MAX_CONTENT_LENGTH`: Maximum file size (default: 100MB)
### Customization
The web interface uses a modular design system:
- **CSS Variables**: Easy theming via CSS custom properties
- **Responsive Design**: Mobile-first approach
- **Component-based**: Reusable UI components
## Development
### Running in Development Mode
```bash
# Install development dependencies
pip install -e .
# Start with debug mode
python -m docstrange.web_app
```
### File Structure
```
docstrange/
βββ web_app.py # Flask application
βββ templates/
β βββ index.html # Main HTML template
βββ static/
βββ styles.css # Design system CSS
βββ script.js # Frontend JavaScript
```
### Testing
```bash
# Run the test script
python test_web_interface.py
```
## Troubleshooting
### Common Issues
1. **Port Already in Use**
```bash
# Use a different port
docstrange web --port 8080
```
2. **File Upload Fails**
- Check file size (max 100MB)
- Verify file format is supported
- Ensure proper file permissions
3. **Extraction Errors**
- Check console logs for detailed error messages
- Verify document is not corrupted
- Try different output formats
### Logs
The web interface logs to the console. Check for:
- File upload events
- Processing status
- Error messages
- API request details
## Contributing
To contribute to the web interface:
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Test thoroughly
5. Submit a pull request
## License
This web interface is part of the DocStrange project and is licensed under the MIT License. |