File size: 4,655 Bytes
c9b0418
 
 
 
 
 
c2a06f5
c9b0418
 
c2a06f5
c9b0418
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c2a06f5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
---
title: SmartReceipt-AI
emoji: 🧾
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.49.1
app_file: app.py
pinned: false
license: apache-2.0
---

# SmartReceipt AI

**SmartReceipt AI** is a multimodal receipt OCR extractor built with **Streamlit**, **Google Gemini (via LangChain)**, and **Groq Whisper** for audio transcription.
It allows users to upload receipt images or provide speech input and converts them into a **structured plain-text receipt format**, preserving store info, order details, items, totals, gratuity, footers, and optionally splitting bills among guests.

---

## Features

* Upload receipt images (`.jpg`, `.jpeg`, `.png`) or provide voice input for instructions.
* Transcribe speech into English using **Groq Whisper**.
* Extract **all visible text** from receipts using **Google Gemini multimodal model**.
* Convert unstructured OCR into a **receipt-style structured layout**.
* Preserve:

  * Store details
  * Order information (order #, table, party size, server, date/time)
  * Items with quantity and price
  * Subtotals, tax, TOTAL
  * Extra sections (gratuity, discounts, payment method)
  * Footer messages (e.g., “Thank you”, “Visit again”)
* **Split the bill** automatically when requested, supporting both numeric and word formats (`4`, `four`, `five persons`, `guest 3`, etc.).
* Chat-like interface with conversation memory and continuous input.
* Export extracted receipts to `.txt` files for easy use.

---

## Project Structure

```
.
├── app.py            # Streamlit UI: upload, audio input, display, export
├── ocr_utils.py      # Gemini OCR + Groq Whisper transcription + split bill logic
├── requirements.txt  # Python dependencies
├── .env              # Environment variables (API keys)
└── README.md         # Project documentation
```

---

## Requirements

* Python 3.10 or higher
* Google Gemini API key (obtain from [https://aistudio.google.com/](https://aistudio.google.com/))
* Groq API key (for Whisper transcription)

---

## Installation

1. Clone the repository:

   ```bash
   git clone https://github.com/your-username/receipt-ocr-bot.git
   cd receipt-ocr-bot
   ```

2. Create and activate a virtual environment (recommended):

   ```bash
   python -m venv venv
   source venv/bin/activate      # Linux/Mac
   venv\Scripts\activate         # Windows
   ```

3. Install dependencies:

   ```bash
   pip install -r requirements.txt
   ```

4. Create a `.env` file in the project root and add your API keys:

   ```
   GOOGLE_API_KEY=your_google_gemini_api_key_here
   GROQ_API_KEY=your_groq_api_key_here
   ```

---

## Running the Application

Start the Streamlit app:

```bash
streamlit run app.py
```

The app will launch in your browser at:

```
http://localhost:8501
```

---

## Usage

1. **Text or Voice Input**:

   * Type instructions or speech (e.g., “Split the bill among 4”).
   * Optionally, record speech using the mini recorder — the app will transcribe to English automatically.
2. **Upload Receipt**:

   * Upload a receipt image (`.jpg`, `.jpeg`, `.png`).
3. **Process OCR**:

   * Click **Analyze Receipt**.
   * The app extracts all receipt details and formats them in a structured plain-text layout.
4. **Split Bill (Optional)**:

   * If the user requested a split in text/speech, the output automatically shows per-person amounts at the end of the receipt.
5. **Download Result**:

   * Use the **Download as TXT** button to export the structured receipt.

---

## Notes

* The system prompt is strictly tuned for **receipts only**.
* TOTAL amounts are always displayed in uppercase.
* Bill splitting supports **both numbers and words** (`4`, `four`, `three people`, `guest 2` etc.).
* Model output is **plain text**; no JSON or Markdown.
* If no receipt is detected, the model will return: `No receipt detected`.

---

## Production Workflow

1. **Audio Input (Optional)** → Transcribed by **Groq Whisper** → Text prompt.
2. **Receipt Image Upload** → OCR by **Google Gemini** → Raw text.
3. **Structured Formatting** → Apply receipt layout rules and alignment.
4. **Split Bill Logic** → Handled automatically by the system prompt when requested.
5. **Display & Export** → Streamlit shows structured receipt + download option.

---

## Support

For issues, questions, or collaboration, contact:
**[syaeem26s@gmail.com](mailto:syaeem26s@gmail.com)**

---

If you want, I can also **update your `app.py` in a fully production-ready style** with:

* Clean UI
* Mini voice recorder + text input combined
* Auto split bill handled via system prompt
* Continuous session state for chat-like experience