Spaces:
Sleeping
Sleeping
| title: expiryprocess | |
| app_file: gradio_app.py | |
| sdk: gradio | |
| sdk_version: 5.20.1 | |
| # Invoice Processing System with Gradio UI | |
| This system processes invoice files (PDF, Excel, Word, Text) and extracts structured data using a combination of OCR, regex patterns, and LLM-based extraction. The extracted data can be downloaded as CSV. | |
| ## Features | |
| - **Multiple File Formats**: Supports PDF, Excel (.xlsx, .xls), Word (.doc, .docx), and Text (.txt) files | |
| - **Document Conversion**: Automatically converts Word and Text files to PDF for processing | |
| - **LLM-Enhanced Extraction**: Uses Google's Generative AI for improved extraction accuracy (optional) | |
| - **Web Interface**: Easy-to-use Gradio UI for uploading files and downloading results | |
| - **CSV Export**: Download extracted data as CSV for further analysis | |
| ## Installation | |
| 1. Clone this repository: | |
| ```bash | |
| git clone <repository-url> | |
| cd invoice-processing-system | |
| ``` | |
| 2. Install dependencies: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| 3. Set up environment variables: | |
| - Create a `.env` file in the project root | |
| - Add your Google API key for LLM processing: | |
| ``` | |
| GOOGLE_API_KEY=your_api_key_here | |
| ``` | |
| ## Usage | |
| ### Web Interface (Gradio UI) | |
| 1. Start the Gradio web interface: | |
| ```bash | |
| python gradio_app.py | |
| ``` | |
| 2. Open your browser and navigate to the URL shown in the terminal (typically http://127.0.0.1:7860) | |
| 3. Upload an invoice file using the file upload button | |
| 4. Click "Process Invoice" to extract data from the file | |
| 5. View the extracted data in the table and download as CSV using the download button | |
| ### Command Line Interface | |
| You can also use the command line interface: | |
| ```bash | |
| # Process a file with default settings (using LLM if available) | |
| python process_invoice.py path/to/invoice.pdf | |
| # Process without using LLM | |
| python process_invoice.py path/to/invoice.xlsx --no-llm | |
| # Process without saving JSON output | |
| python process_invoice.py path/to/invoice.docx --no-json | |
| ``` | |
| ## Requirements | |
| - Python 3.8+ | |
| - Google API key (for LLM-enhanced extraction) | |
| - LibreOffice (for converting .doc/.docx files to PDF) | |
| - Tesseract OCR (for PDF processing) | |
| ## Troubleshooting | |
| - **LLM Processing Not Available**: Ensure your Google API key is correctly set in the `.env` file | |
| - **PDF Conversion Issues**: Make sure LibreOffice is installed and accessible in your PATH | |
| - **OCR Quality Issues**: Ensure Tesseract OCR is properly installed and configured | |
| ## License | |
| [MIT License](LICENSE) |