Spaces:

Sayeem26s
/

SmartReceipt-AI

Sleeping

App Files Files Community

Sayeem26s commited on Sep 18, 2025

Commit

85a47a4

verified ·

1 Parent(s): 2de8174

Upload 6 files

Browse files

Files changed (6) hide show

.env +4 -0
.gitignore +68 -0
README.md +151 -20
app.py +72 -0
ocr_utils.py +89 -0
requirements.txt +21 -3

.env ADDED Viewed

	@@ -0,0 +1,4 @@

+GROQ_API_KEY=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
+GOOGLE_API_KEY=AIzaSyCSrNap1UdeMX4v2yhGypFp_wz_0HefSYQ
+#AIzaSyC3FxcupgQE6BggI0LMCwtDPNnY3rCGmKI

.gitignore ADDED Viewed

	@@ -0,0 +1,68 @@

+# Environment variables
+.env
+.env.*
+# Virtual environments
+venv/
+env/
+.venv/
+.venv*/
+# Byte-compiled files
+__pycache__/
+*.py[cod]
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+# Logs and databases
+*.log
+*.sqlite3
+*.db
+# IDE-specific files
+.vscode/
+.idea/
+*.iml
+# OS-specific files
+.DS_Store
+Thumbs.db
+desktop.ini
+# Test coverage
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+.pytest_cache/
+nosetests.xml
+coverage.xml
+*.cover
+.hypothesis/
+# Jupyter Notebook checkpoints
+.ipynb_checkpoints/
+# Local configuration files
+*.local
+# History
+.history

README.md CHANGED Viewed

@@ -1,20 +1,151 @@
----
-title: SmartReceipt AI
-emoji: 🚀
-colorFrom: red
-colorTo: red
-sdk: docker
-app_port: 8501
-tags:
-- streamlit
-pinned: false
-short_description: Streamlit template space
-license: apache-2.0
----
-# Welcome to Streamlit!
-Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
-If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
-forums](https://discuss.streamlit.io).

+# SmartReceipt AI
+**SmartReceipt AI** is a multimodal receipt OCR extractor built with **Streamlit**, **Google Gemini (via LangChain)**, and **Groq Whisper** for audio transcription.
+It allows users to upload receipt images or provide speech input and converts them into a **structured plain-text receipt format**, preserving store info, order details, items, totals, gratuity, footers, and optionally splitting bills among guests.
+---
+## Features
+* Upload receipt images (`.jpg`, `.jpeg`, `.png`) or provide voice input for instructions.
+* Transcribe speech into English using **Groq Whisper**.
+* Extract **all visible text** from receipts using **Google Gemini multimodal model**.
+* Convert unstructured OCR into a **receipt-style structured layout**.
+* Preserve:
+  * Store details
+  * Order information (order #, table, party size, server, date/time)
+  * Items with quantity and price
+  * Subtotals, tax, TOTAL
+  * Extra sections (gratuity, discounts, payment method)
+  * Footer messages (e.g., “Thank you”, “Visit again”)
+* **Split the bill** automatically when requested, supporting both numeric and word formats (`4`, `four`, `five persons`, `guest 3`, etc.).
+* Chat-like interface with conversation memory and continuous input.
+* Export extracted receipts to `.txt` files for easy use.
+---
+## Project Structure
+```
+.
+├── app.py            # Streamlit UI: upload, audio input, display, export
+├── ocr_utils.py      # Gemini OCR + Groq Whisper transcription + split bill logic
+├── requirements.txt  # Python dependencies
+├── .env              # Environment variables (API keys)
+└── README.md         # Project documentation
+```
+---
+## Requirements
+* Python 3.10 or higher
+* Google Gemini API key (obtain from [https://aistudio.google.com/](https://aistudio.google.com/))
+* Groq API key (for Whisper transcription)
+---
+## Installation
+1. Clone the repository:
+   ```bash
+   git clone https://github.com/your-username/receipt-ocr-bot.git
+   cd receipt-ocr-bot
+   ```
+2. Create and activate a virtual environment (recommended):
+   ```bash
+   python -m venv venv
+   source venv/bin/activate      # Linux/Mac
+   venv\Scripts\activate         # Windows
+   ```
+3. Install dependencies:
+   ```bash
+   pip install -r requirements.txt
+   ```
+4. Create a `.env` file in the project root and add your API keys:
+   ```
+   GOOGLE_API_KEY=your_google_gemini_api_key_here
+   GROQ_API_KEY=your_groq_api_key_here
+   ```
+---
+## Running the Application
+Start the Streamlit app:
+```bash
+streamlit run app.py
+```
+The app will launch in your browser at:
+```
+http://localhost:8501
+```
+---
+## Usage
+1. **Text or Voice Input**:
+   * Type instructions or speech (e.g., “Split the bill among 4”).
+   * Optionally, record speech using the mini recorder — the app will transcribe to English automatically.
+2. **Upload Receipt**:
+   * Upload a receipt image (`.jpg`, `.jpeg`, `.png`).
+3. **Process OCR**:
+   * Click **Analyze Receipt**.
+   * The app extracts all receipt details and formats them in a structured plain-text layout.
+4. **Split Bill (Optional)**:
+   * If the user requested a split in text/speech, the output automatically shows per-person amounts at the end of the receipt.
+5. **Download Result**:
+   * Use the **Download as TXT** button to export the structured receipt.
+---
+## Notes
+* The system prompt is strictly tuned for **receipts only**.
+* TOTAL amounts are always displayed in uppercase.
+* Bill splitting supports **both numbers and words** (`4`, `four`, `three people`, `guest 2` etc.).
+* Model output is **plain text**; no JSON or Markdown.
+* If no receipt is detected, the model will return: `No receipt detected`.
+---
+## Production Workflow
+1. **Audio Input (Optional)** → Transcribed by **Groq Whisper** → Text prompt.
+2. **Receipt Image Upload** → OCR by **Google Gemini** → Raw text.
+3. **Structured Formatting** → Apply receipt layout rules and alignment.
+4. **Split Bill Logic** → Handled automatically by the system prompt when requested.
+5. **Display & Export** → Streamlit shows structured receipt + download option.
+---
+## Support
+For issues, questions, or collaboration, contact:
+**[syaeem26s@gmail.com](mailto:syaeem26s@gmail.com)**
+---
+If you want, I can also **update your `app.py` in a fully production-ready style** with:
+* Clean UI
+* Mini voice recorder + text input combined
+* Auto split bill handled via system prompt
+* Continuous session state for chat-like experience

app.py ADDED Viewed

	@@ -0,0 +1,72 @@

+import streamlit as st
+from PIL import Image
+from ocr_utils import extract_receipt_text, extract_from_text, transcribe_audio
+from streamlit_mic_recorder import mic_recorder
+import tempfile
+import os
+# ------------------ Streamlit UI ------------------
+st.set_page_config(page_title="SmartReceipt AI", layout="centered")
+st.title("SmartReceipt AI")
+st.write("Provide your text or speech And upload a receipt image to extract structured plain-text.")
+# Session state
+if "user_text" not in st.session_state:
+    st.session_state.user_text = ""
+if "uploaded_image" not in st.session_state:
+    st.session_state.uploaded_image = None
+if "ocr_result" not in st.session_state:
+    st.session_state.ocr_result = None
+# ---------------- Input: User Text or Speech ----------------
+st.subheader("Enter text or record speech")
+# Text input field
+st.session_state.user_text = st.text_area("Type your input here:", st.session_state.user_text, height=100)
+# Mic recorder
+audio = mic_recorder(
+    start_prompt="Start Recording",
+    stop_prompt="Stop Recording",
+    just_once=True,
+    use_container_width=True
+)
+if audio and "bytes" in audio:
+    with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as tmp_file:
+        tmp_file.write(audio["bytes"])
+        tmp_path = tmp_file.name
+    transcribed_text = transcribe_audio(tmp_path)
+    st.session_state.user_text = transcribed_text
+    st.text_area("Transcribed Text:", transcribed_text, height=100)
+    os.remove(tmp_path)
+# ---------------- Input: Receipt Image ----------------
+uploaded_file = st.file_uploader("Upload a receipt (JPG/PNG)", type=["jpg", "jpeg", "png"])
+if uploaded_file:
+    st.session_state.uploaded_image = uploaded_file
+    image = Image.open(uploaded_file)
+    st.image(image, caption="Uploaded Receipt", width=400)
+# ---------------- Run OCR ----------------
+if st.button("Analyze Receipt"):
+    if st.session_state.user_text.strip() and st.session_state.uploaded_image:
+        with st.spinner("Processing..."):
+            ocr_text = extract_receipt_text(st.session_state.uploaded_image)
+            model_input_text = st.session_state.user_text
+            final_result = extract_from_text(f"User Prompt: {model_input_text}\n\n{ocr_text}")
+            st.session_state.ocr_result = final_result
+    else:
+        st.warning("Please provide both a user prompt (text or speech) and a receipt image.")
+# ---------------- Show Result ----------------
+if st.session_state.ocr_result:
+    st.subheader("Extracted Receipt Text")
+    st.text_area("OCR Result", st.session_state.ocr_result, height=400)
+    st.download_button(
+        "Download Receipt as TXT",
+        data=st.session_state.ocr_result,
+        file_name="receipt_output.txt",
+        mime="text/plain"
+    )

ocr_utils.py ADDED Viewed

	@@ -0,0 +1,89 @@

+import base64
+import os
+from dotenv import load_dotenv
+from langchain_google_genai import ChatGoogleGenerativeAI
+from langchain.schema import HumanMessage, SystemMessage
+from groq import Groq
+# Load API keys
+load_dotenv()
+GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
+GROQ_API_KEY = os.getenv("GROQ_API_KEY")
+# Initialize Gemini LLM
+llm = ChatGoogleGenerativeAI(
+    model="gemini-2.5-pro",
+    temperature=0,
+    max_output_tokens=2048,
+    google_api_key=GOOGLE_API_KEY
+)
+# Groq client for Whisper
+groq_client = Groq(api_key=GROQ_API_KEY)
+# System prompt with strict splitting rules
+system_prompt = """
+You are a strict OCR analyst specialized in receipts.
+- Extract ALL text from the uploaded receipt image or provided transcription and represent the text exactly like the receipt (keep spacing/alignment).
+- Do not remove or skip fields that exist on the receipt.
+- Keep spacing aligned, totals right-justified.
+- TOTAL must always be uppercase.
+- If no receipt detected, reply: No receipt detected.
+--- SPLIT BILL INSTRUCTION ---
+If the user requests to split the bill (e.g., "split among 4", "divide bill in four", "split for five people", "guest 3", "3 persons", "two friends", etc.):
+1. Accept both digits (1, 2, 3, 4, etc.) and words ("one", "two", "three", "four", etc.).
+2. Extract the TOTAL from the receipt.
+3. Divide TOTAL by the requested number of persons.
+4. At the END of the receipt output, strictly append in this format:
+---
+Split Bill (N persons): X.XX each
+---
+Where N is the number of persons and X.XX is the per-person share.
+If no split is requested, do not add anything.
+"""
+def extract_receipt_text(uploaded_file):
+    """Convert uploaded receipt image to structured text using Gemini."""
+    img_bytes = uploaded_file.getvalue()
+    img_base64 = base64.b64encode(img_bytes).decode("utf-8")
+    messages = [
+        SystemMessage(content=system_prompt),
+        HumanMessage(content=[
+            {"type": "text", "text": "Extract the receipt text in structured plain text."},
+            {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{img_base64}"}}
+        ])
+    ]
+    response = llm.invoke(messages)
+    return response.content
+def extract_from_text(text_input: str):
+    """Send raw text (from transcription or manual input) to Gemini OCR pipeline."""
+    messages = [
+        SystemMessage(content=system_prompt),
+        HumanMessage(content=text_input)
+    ]
+    response = llm.invoke(messages)
+    return response.content
+def transcribe_audio(file_path: str) -> str:
+    """Transcribe audio in English using Groq Whisper API."""
+    with open(file_path, "rb") as f:
+        file_bytes = f.read()
+    transcription = groq_client.audio.transcriptions.create(
+        file=(file_path, file_bytes),
+        model="whisper-large-v3",
+        response_format="verbose_json",
+        language="en"  # Force transcription output in English
+    )
+    if hasattr(transcription, "text"):
+        return transcription.text
+    elif isinstance(transcription, dict):
+        return transcription.get("text") or transcription.get("transcription") or ""
+    return str(transcription)

requirements.txt CHANGED Viewed

@@ -1,3 +1,21 @@
-altair
-pandas
-streamlit

+# --- Core Streamlit app ---
+streamlit
+pillow
+python-dotenv
+# --- LangChain + Gemini ---
+langchain
+langchain-google-genai
+google-generativeai
+# --- Groq Whisper API ---
+groq
+# --- Audio Recording (choose ONE) ---
+# For st_audiorec (GitHub install)
+#git+https://github.com/stefanrmmr/streamlit_audio_recorder
+# OR
+streamlit-mic-recorder
+# --- Helpers ---
+tqdm