Spaces:
Sleeping
Sleeping
File size: 4,655 Bytes
c9b0418 c2a06f5 c9b0418 c2a06f5 c9b0418 c2a06f5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 |
---
title: SmartReceipt-AI
emoji: 🧾
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.49.1
app_file: app.py
pinned: false
license: apache-2.0
---
# SmartReceipt AI
**SmartReceipt AI** is a multimodal receipt OCR extractor built with **Streamlit**, **Google Gemini (via LangChain)**, and **Groq Whisper** for audio transcription.
It allows users to upload receipt images or provide speech input and converts them into a **structured plain-text receipt format**, preserving store info, order details, items, totals, gratuity, footers, and optionally splitting bills among guests.
---
## Features
* Upload receipt images (`.jpg`, `.jpeg`, `.png`) or provide voice input for instructions.
* Transcribe speech into English using **Groq Whisper**.
* Extract **all visible text** from receipts using **Google Gemini multimodal model**.
* Convert unstructured OCR into a **receipt-style structured layout**.
* Preserve:
* Store details
* Order information (order #, table, party size, server, date/time)
* Items with quantity and price
* Subtotals, tax, TOTAL
* Extra sections (gratuity, discounts, payment method)
* Footer messages (e.g., “Thank you”, “Visit again”)
* **Split the bill** automatically when requested, supporting both numeric and word formats (`4`, `four`, `five persons`, `guest 3`, etc.).
* Chat-like interface with conversation memory and continuous input.
* Export extracted receipts to `.txt` files for easy use.
---
## Project Structure
```
.
├── app.py # Streamlit UI: upload, audio input, display, export
├── ocr_utils.py # Gemini OCR + Groq Whisper transcription + split bill logic
├── requirements.txt # Python dependencies
├── .env # Environment variables (API keys)
└── README.md # Project documentation
```
---
## Requirements
* Python 3.10 or higher
* Google Gemini API key (obtain from [https://aistudio.google.com/](https://aistudio.google.com/))
* Groq API key (for Whisper transcription)
---
## Installation
1. Clone the repository:
```bash
git clone https://github.com/your-username/receipt-ocr-bot.git
cd receipt-ocr-bot
```
2. Create and activate a virtual environment (recommended):
```bash
python -m venv venv
source venv/bin/activate # Linux/Mac
venv\Scripts\activate # Windows
```
3. Install dependencies:
```bash
pip install -r requirements.txt
```
4. Create a `.env` file in the project root and add your API keys:
```
GOOGLE_API_KEY=your_google_gemini_api_key_here
GROQ_API_KEY=your_groq_api_key_here
```
---
## Running the Application
Start the Streamlit app:
```bash
streamlit run app.py
```
The app will launch in your browser at:
```
http://localhost:8501
```
---
## Usage
1. **Text or Voice Input**:
* Type instructions or speech (e.g., “Split the bill among 4”).
* Optionally, record speech using the mini recorder — the app will transcribe to English automatically.
2. **Upload Receipt**:
* Upload a receipt image (`.jpg`, `.jpeg`, `.png`).
3. **Process OCR**:
* Click **Analyze Receipt**.
* The app extracts all receipt details and formats them in a structured plain-text layout.
4. **Split Bill (Optional)**:
* If the user requested a split in text/speech, the output automatically shows per-person amounts at the end of the receipt.
5. **Download Result**:
* Use the **Download as TXT** button to export the structured receipt.
---
## Notes
* The system prompt is strictly tuned for **receipts only**.
* TOTAL amounts are always displayed in uppercase.
* Bill splitting supports **both numbers and words** (`4`, `four`, `three people`, `guest 2` etc.).
* Model output is **plain text**; no JSON or Markdown.
* If no receipt is detected, the model will return: `No receipt detected`.
---
## Production Workflow
1. **Audio Input (Optional)** → Transcribed by **Groq Whisper** → Text prompt.
2. **Receipt Image Upload** → OCR by **Google Gemini** → Raw text.
3. **Structured Formatting** → Apply receipt layout rules and alignment.
4. **Split Bill Logic** → Handled automatically by the system prompt when requested.
5. **Display & Export** → Streamlit shows structured receipt + download option.
---
## Support
For issues, questions, or collaboration, contact:
**[syaeem26s@gmail.com](mailto:syaeem26s@gmail.com)**
---
If you want, I can also **update your `app.py` in a fully production-ready style** with:
* Clean UI
* Mini voice recorder + text input combined
* Auto split bill handled via system prompt
* Continuous session state for chat-like experience |