SmartReceipt-AI / README.md
Sayeem26s's picture
Update README.md
c2a06f5 verified
---
title: SmartReceipt-AI
emoji: 🧾
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.49.1
app_file: app.py
pinned: false
license: apache-2.0
---
# SmartReceipt AI
**SmartReceipt AI** is a multimodal receipt OCR extractor built with **Streamlit**, **Google Gemini (via LangChain)**, and **Groq Whisper** for audio transcription.
It allows users to upload receipt images or provide speech input and converts them into a **structured plain-text receipt format**, preserving store info, order details, items, totals, gratuity, footers, and optionally splitting bills among guests.
---
## Features
* Upload receipt images (`.jpg`, `.jpeg`, `.png`) or provide voice input for instructions.
* Transcribe speech into English using **Groq Whisper**.
* Extract **all visible text** from receipts using **Google Gemini multimodal model**.
* Convert unstructured OCR into a **receipt-style structured layout**.
* Preserve:
* Store details
* Order information (order #, table, party size, server, date/time)
* Items with quantity and price
* Subtotals, tax, TOTAL
* Extra sections (gratuity, discounts, payment method)
* Footer messages (e.g., β€œThank you”, β€œVisit again”)
* **Split the bill** automatically when requested, supporting both numeric and word formats (`4`, `four`, `five persons`, `guest 3`, etc.).
* Chat-like interface with conversation memory and continuous input.
* Export extracted receipts to `.txt` files for easy use.
---
## Project Structure
```
.
β”œβ”€β”€ app.py # Streamlit UI: upload, audio input, display, export
β”œβ”€β”€ ocr_utils.py # Gemini OCR + Groq Whisper transcription + split bill logic
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ .env # Environment variables (API keys)
└── README.md # Project documentation
```
---
## Requirements
* Python 3.10 or higher
* Google Gemini API key (obtain from [https://aistudio.google.com/](https://aistudio.google.com/))
* Groq API key (for Whisper transcription)
---
## Installation
1. Clone the repository:
```bash
git clone https://github.com/your-username/receipt-ocr-bot.git
cd receipt-ocr-bot
```
2. Create and activate a virtual environment (recommended):
```bash
python -m venv venv
source venv/bin/activate # Linux/Mac
venv\Scripts\activate # Windows
```
3. Install dependencies:
```bash
pip install -r requirements.txt
```
4. Create a `.env` file in the project root and add your API keys:
```
GOOGLE_API_KEY=your_google_gemini_api_key_here
GROQ_API_KEY=your_groq_api_key_here
```
---
## Running the Application
Start the Streamlit app:
```bash
streamlit run app.py
```
The app will launch in your browser at:
```
http://localhost:8501
```
---
## Usage
1. **Text or Voice Input**:
* Type instructions or speech (e.g., β€œSplit the bill among 4”).
* Optionally, record speech using the mini recorder β€” the app will transcribe to English automatically.
2. **Upload Receipt**:
* Upload a receipt image (`.jpg`, `.jpeg`, `.png`).
3. **Process OCR**:
* Click **Analyze Receipt**.
* The app extracts all receipt details and formats them in a structured plain-text layout.
4. **Split Bill (Optional)**:
* If the user requested a split in text/speech, the output automatically shows per-person amounts at the end of the receipt.
5. **Download Result**:
* Use the **Download as TXT** button to export the structured receipt.
---
## Notes
* The system prompt is strictly tuned for **receipts only**.
* TOTAL amounts are always displayed in uppercase.
* Bill splitting supports **both numbers and words** (`4`, `four`, `three people`, `guest 2` etc.).
* Model output is **plain text**; no JSON or Markdown.
* If no receipt is detected, the model will return: `No receipt detected`.
---
## Production Workflow
1. **Audio Input (Optional)** β†’ Transcribed by **Groq Whisper** β†’ Text prompt.
2. **Receipt Image Upload** β†’ OCR by **Google Gemini** β†’ Raw text.
3. **Structured Formatting** β†’ Apply receipt layout rules and alignment.
4. **Split Bill Logic** β†’ Handled automatically by the system prompt when requested.
5. **Display & Export** β†’ Streamlit shows structured receipt + download option.
---
## Support
For issues, questions, or collaboration, contact:
**[syaeem26s@gmail.com](mailto:syaeem26s@gmail.com)**
---
If you want, I can also **update your `app.py` in a fully production-ready style** with:
* Clean UI
* Mini voice recorder + text input combined
* Auto split bill handled via system prompt
* Continuous session state for chat-like experience