Spaces:

Sayeem26s
/

SmartReceipt-AI

Running

App Files Files Community

SmartReceipt-AI / README.md

Sayeem26s

Update README.md

c2a06f5 verified 5 months ago

preview code

raw

history blame contribute delete

4.66 kB

A newer version of the Streamlit SDK is available: 1.53.1

Upgrade

metadata

title: SmartReceipt-AI
emoji: 🧾
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.49.1
app_file: app.py
pinned: false
license: apache-2.0

SmartReceipt AI

SmartReceipt AI is a multimodal receipt OCR extractor built with Streamlit, Google Gemini (via LangChain), and Groq Whisper for audio transcription. It allows users to upload receipt images or provide speech input and converts them into a structured plain-text receipt format, preserving store info, order details, items, totals, gratuity, footers, and optionally splitting bills among guests.

Features

Upload receipt images (.jpg, .jpeg, .png) or provide voice input for instructions.
Transcribe speech into English using Groq Whisper.
Extract all visible text from receipts using Google Gemini multimodal model.
Convert unstructured OCR into a receipt-style structured layout.
Preserve:
- Store details
- Order information (order #, table, party size, server, date/time)
- Items with quantity and price
- Subtotals, tax, TOTAL
- Extra sections (gratuity, discounts, payment method)
- Footer messages (e.g., “Thank you”, “Visit again”)
Split the bill automatically when requested, supporting both numeric and word formats (4, four, five persons, guest 3, etc.).
Chat-like interface with conversation memory and continuous input.
Export extracted receipts to .txt files for easy use.

Project Structure

.
├── app.py            # Streamlit UI: upload, audio input, display, export
├── ocr_utils.py      # Gemini OCR + Groq Whisper transcription + split bill logic
├── requirements.txt  # Python dependencies
├── .env              # Environment variables (API keys)
└── README.md         # Project documentation

Requirements

Python 3.10 or higher
Google Gemini API key (obtain from https://aistudio.google.com/)
Groq API key (for Whisper transcription)

Installation

Clone the repository:

git clone https://github.com/your-username/receipt-ocr-bot.git
cd receipt-ocr-bot

Create and activate a virtual environment (recommended):

python -m venv venv
source venv/bin/activate      # Linux/Mac
venv\Scripts\activate         # Windows

Install dependencies:
```
pip install -r requirements.txt
```

Create a .env file in the project root and add your API keys:

GOOGLE_API_KEY=your_google_gemini_api_key_here
GROQ_API_KEY=your_groq_api_key_here

Running the Application

Start the Streamlit app:

streamlit run app.py

The app will launch in your browser at:

http://localhost:8501

Usage

Text or Voice Input:
- Type instructions or speech (e.g., “Split the bill among 4”).
- Optionally, record speech using the mini recorder — the app will transcribe to English automatically.
Upload Receipt:
- Upload a receipt image (.jpg, .jpeg, .png).
Process OCR:
- Click Analyze Receipt.
- The app extracts all receipt details and formats them in a structured plain-text layout.
Split Bill (Optional):
- If the user requested a split in text/speech, the output automatically shows per-person amounts at the end of the receipt.
Download Result:
- Use the Download as TXT button to export the structured receipt.

Notes

The system prompt is strictly tuned for receipts only.
TOTAL amounts are always displayed in uppercase.
Bill splitting supports both numbers and words (4, four, three people, guest 2 etc.).
Model output is plain text; no JSON or Markdown.
If no receipt is detected, the model will return: No receipt detected.

Production Workflow

Audio Input (Optional) → Transcribed by Groq Whisper → Text prompt.
Receipt Image Upload → OCR by Google Gemini → Raw text.
Structured Formatting → Apply receipt layout rules and alignment.
Split Bill Logic → Handled automatically by the system prompt when requested.
Display & Export → Streamlit shows structured receipt + download option.

Support

For issues, questions, or collaboration, contact: syaeem26s@gmail.com

If you want, I can also update your app.py in a fully production-ready style with:

Clean UI
Mini voice recorder + text input combined
Auto split bill handled via system prompt
Continuous session state for chat-like experience