File size: 3,709 Bytes
5b14aa2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
# DocStrange Web Interface

A beautiful, modern web interface for the DocStrange document extraction library, inspired by the data-extraction-apis project design.

## Features

- **Modern UI**: Clean, responsive design with drag-and-drop file upload
- **Multiple Formats**: Support for PDF, Word, Excel, PowerPoint, images, and more
- **Output Options**: Convert to Markdown, HTML, JSON, CSV, or Flat JSON
- **Real-time Processing**: Live extraction with progress indicators
- **Download Results**: Save extracted content in various formats
- **Mobile Friendly**: Responsive design that works on all devices

## Quick Start

### 1. Install Dependencies

```bash
pip install docstrange[web]
```

### 2. Start the Web Interface

```bash
docstrange web
```

### 3. Open Your Browser

Navigate to: http://localhost:8000

## Usage

### File Upload

1. **Drag & Drop**: Simply drag your file onto the upload area
2. **Click to Browse**: Click the upload area to select a file from your computer
3. **Supported Formats**: PDF, Word (.docx, .doc), Excel (.xlsx, .xls), PowerPoint (.pptx, .ppt), HTML, CSV, Text, Images (PNG, JPG, TIFF, BMP)

### Output Format Selection

Choose from multiple output formats:

- **Markdown**: Clean, structured markdown text
- **HTML**: Formatted HTML with styling
- **JSON**: Structured JSON data
- **CSV**: Table data in CSV format
- **Flat JSON**: Simplified JSON structure

### Results View

After processing, you can:

- **Preview**: View formatted content in the preview tab
- **Raw Output**: See the raw extracted text
- **Download**: Save results as text or JSON files

## API Endpoints

The web interface also provides REST API endpoints:

### Health Check
```
GET /api/health
```

### Get Supported Formats
```
GET /api/supported-formats
```

### Extract Document
```
POST /api/extract
Content-Type: multipart/form-data

Parameters:
- file: The document file to extract
- output_format: markdown, html, json, csv, flat-json
```

## Configuration

### Environment Variables

- `FLASK_ENV`: Set to `development` for debug mode
- `MAX_CONTENT_LENGTH`: Maximum file size (default: 100MB)

### Customization

The web interface uses a modular design system:

- **CSS Variables**: Easy theming via CSS custom properties
- **Responsive Design**: Mobile-first approach
- **Component-based**: Reusable UI components

## Development

### Running in Development Mode

```bash
# Install development dependencies
pip install -e .

# Start with debug mode
python -m docstrange.web_app
```

### File Structure

```
docstrange/
β”œβ”€β”€ web_app.py          # Flask application
β”œβ”€β”€ templates/
β”‚   └── index.html      # Main HTML template
└── static/
    β”œβ”€β”€ styles.css      # Design system CSS
    └── script.js       # Frontend JavaScript
```

### Testing

```bash
# Run the test script
python test_web_interface.py
```

## Troubleshooting

### Common Issues

1. **Port Already in Use**
   ```bash
   # Use a different port
   docstrange web --port 8080
   ```

2. **File Upload Fails**
   - Check file size (max 100MB)
   - Verify file format is supported
   - Ensure proper file permissions

3. **Extraction Errors**
   - Check console logs for detailed error messages
   - Verify document is not corrupted
   - Try different output formats

### Logs

The web interface logs to the console. Check for:
- File upload events
- Processing status
- Error messages
- API request details

## Contributing

To contribute to the web interface:

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Test thoroughly
5. Submit a pull request

## License

This web interface is part of the DocStrange project and is licensed under the MIT License.