Spaces:

Ekow24
/

AI_Spending_Analyzer

Sleeping

App Files Files Community

Ekow24 commited on Oct 8, 2025

Commit

bf36031

verified ·

1 Parent(s): a11ca1f

Upload 6 files

Browse files

Files changed (6) hide show

Dockerfile +9 -41
README.md +44 -57
__init__.py +4 -0
ai_app.py +574 -0
requirements.txt +4 -3
utils.py +473 -0

Dockerfile CHANGED Viewed

@@ -1,52 +1,20 @@
-<<<<<<< HEAD
 FROM python:3.12-slim
 WORKDIR /app
-# Copy project files
-COPY . /app
-# Install build deps (kept minimal) and pip packages
-RUN apt-get update && apt-get install -y --no-install-recommends build-essential git && rm -rf /var/lib/apt/lists/*
-RUN pip install --no-cache-dir -r requirements.txt
-# Expose Streamlit port
-EXPOSE 8501
-# Start the Streamlit app
-CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]
-FROM python:3.12-slim
-WORKDIR /app
-# Copy code
-COPY . /app
 # Install dependencies
 RUN pip install --no-cache-dir -r requirements.txt
-EXPOSE 8501
-CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]
-=======
-FROM python:3.13.5-slim
-WORKDIR /app
-RUN apt-get update && apt-get install -y \
-    build-essential \
-    curl \
-    git \
-    && rm -rf /var/lib/apt/lists/*
-COPY requirements.txt ./
-COPY src/ ./src/
-RUN pip3 install -r requirements.txt
 EXPOSE 8501
-HEALTHCHECK CMD curl --fail http://localhost:8501/_stcore/health
-ENTRYPOINT ["streamlit", "run", "src/streamlit_app.py", "--server.port=8501", "--server.address=0.0.0.0"]
->>>>>>> hf/main

+# Use Python base image
 FROM python:3.12-slim
+# Set working directory
 WORKDIR /app
+# Copy dependencies
+COPY requirements.txt .
 # Install dependencies
 RUN pip install --no-cache-dir -r requirements.txt
+# Copy all code
+COPY . .
+# Expose Streamlit port
 EXPOSE 8501
+# Run app
+CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]

README.md CHANGED Viewed

@@ -1,60 +1,47 @@
----
-title: My Streamlit App
-emoji: 🚀
-colorFrom: blue
-colorTo: green
-sdk: streamlit
-sdk_version: "1.39.0"
-app_file: app.py
-pinned: false
----
-<<<<<<< HEAD
-AI Spending Analyser — Hugging Face Spaces ready
-This repo contains a Streamlit app that analyses synthetic spending data and provides a small local LLM-powered summary by default.
-Key points for deployment on Hugging Face Spaces
-- The app entrypoint is `app.py` at the repository root. Spaces will run `streamlit run app.py`.
-- Dependencies are listed in `requirements.txt` at the repo root. They include `transformers` and `torch` so a small local model can be used.
- - Dependencies are listed in `requirements.txt` at the repo root. To keep builds fast on Spaces, `torch` has been removed; the app uses a small local model (`distilgpt2`) by default.
-- Default AI engine: `HuggingFace` (local). The app attempts to load `distilgpt2` by default — this is free and runs on CPU.
-- OpenAI: available as a sidebar option but shows "Coming soon" in the UI because this project avoids paid APIs by default.
-Environment variables you can set in the Space (optional)
-- `HF_LOCAL_MODEL` — change the local model name (e.g., `distilgpt2`, `gpt2`, or another HF Hub model compatible with causal LM inference).
-Secrets (add before running remote HF inference)
-- Add a secret named `streamlit` in your Space Settings → Secrets and set its value to your new Hugging Face token.
-- Alternatively set `HF_TOKEN_NAME` to the secret key name you used.
-How to deploy
-1. Push this repository to a git remote.
-2. Create a new Hugging Face Space (Streamlit) and point it to this repo.
-3. No API keys are required for the default mode. If you later enable cloud inference, add the appropriate secrets.
-Notes
-- Model download happens on first run and may take a moment.
-- If build times on Spaces are long because of `torch`, consider switching to a smaller model or using CPU-optimized wheels.
- - If build times on Spaces are long because of heavy dependencies, remove them from `requirements.txt` (we removed `torch` to speed builds).
-=======
----
-title: AI Spending Analyzer
-emoji: 🚀
-colorFrom: red
-colorTo: red
-sdk: docker
-app_port: 8501
-tags:
 - streamlit
-pinned: false
-short_description: Streamlit template space
----
-# Welcome to Streamlit!
-Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
-If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
-forums](https://discuss.streamlit.io).
->>>>>>> hf/main

+AI Spending Analyser (Streamlit)
+Features
+- Synthetic dataset (~900 rows) across ~1 year with realistic variability
+- Filters: date range, categories, merchant query
+- Metrics: total, average monthly, max/min transaction
+- Charts: daily trend (line), spend by category (bar), payment methods (donut)
+- AI summary: OpenAI GPT if OPENAI_API_KEY exists, else deterministic heuristic summary
+- CSV download of filtered data
+Quickstart
+1) Install
+  python -m venv .venv
+  . .venv/Scripts/activate  # Windows PowerShell
+  pip install -r ai_spending_analyser/requirements.txt
+2) Run locally
+  streamlit run ai_spending_analyser/app.py
+3) (Optional) Enable OpenAI summaries
+  # PowerShell
+  $env:OPENAI_API_KEY = "sk-..."
+Deploy to Streamlit Cloud
+1. Push this folder to a GitHub repo.
+2. On Streamlit Cloud, create a new app pointing to ai_spending_analyser/app.py.
+3. Add OPENAI_API_KEY as a secret if you want AI summaries.
+Libraries
 - streamlit
+- pandas
+- numpy
+- plotly
+- openai (optional)
+- ollama (optional; for free local LLM)
+Local LLM (Ollama)
+1) Install Ollama: https://ollama.com
+2) Run Ollama and pull a small model, e.g.:
+   ollama pull llama3.2
+3) In the app sidebar, set Engine to "Ollama" and (optionally) model to "llama3.2".
+4) No API keys needed; runs fully offline.
+Notes
+- The app gracefully handles empty filters by showing an info message.
+- Regenerate button synthesizes a fresh dataset.

__init__.py ADDED Viewed

	@@ -0,0 +1,4 @@


1	+ # Make this directory a package so relative imports in app.py work when run as a module
2	+
3	+
4	+

ai_app.py ADDED Viewed

	@@ -0,0 +1,574 @@

+# app.py
+import os
+import time
+import json
+from datetime import datetime
+from typing import Optional
+import pandas as pd
+import requests
+import streamlit as st
+# Support running as a module or script
+try:
+	from .utils import (
+		generate_synthetic_transactions,
+		filter_transactions,
+		compute_aggregations,
+		build_time_series_chart,
+		build_category_bar_chart,
+		build_payment_method_pie_chart,
+		summarize_with_ai,
+	)
+except Exception:  # ImportError or relative import context issues
+	from utils import (
+		generate_synthetic_transactions,
+		filter_transactions,
+		compute_aggregations,
+		build_time_series_chart,
+		build_category_bar_chart,
+		build_payment_method_pie_chart,
+		summarize_with_ai,
+	)
+st.set_page_config(
+	page_title="AI Spending Analyser",
+	page_icon="💳",
+	layout="wide",
+)
+def init_session_state():
+	if "data" not in st.session_state:
+		st.session_state.data = generate_synthetic_transactions(n_rows=900, seed=42)
+	if "filters" not in st.session_state:
+		min_date = st.session_state.data["Date"].min()
+		max_date = st.session_state.data["Date"].max()
+		st.session_state.filters = {
+			"date_range": (min_date, max_date),
+			"categories": [],
+			"merchant_query": "",
+		}
+def render_header():
+    """
+    Render a header with a blue ^ symbol and app title.
+    """
+    st.markdown(
+        """
+        <div style='display: flex; align-items: baseline; gap: 15px; margin-bottom: 20px;'>
+            <div style='font-size: 80px; color: #00AEEF; font-weight: bold; line-height: 1;'>^</div>
+            <div style='font-size: 36px; color: #697089; font-weight: 500; line-height: 1;'>AI Spending Analyser</div>
+        </div>
+        """,
+        unsafe_allow_html=True,
+    )
+def render_assistant_banner():
+    # Removed per request: no top assistant banner
+    return
+def render_chat_fab():
+    # Removed per request: no floating chat widget
+    return
+def render_sidebar(df: pd.DataFrame):
+	st.sidebar.header("Filters")
+	min_d = df["Date"].min()
+	max_d = df["Date"].max()
+	# Separate From and To date inputs
+	st.sidebar.subheader("Date Range")
+	col1, col2 = st.sidebar.columns(2)
+	with col1:
+		from_date = st.date_input(
+			"From",
+			value=min_d.date(),
+			min_value=min_d.date(),
+			max_value=max_d.date(),
+			key="from_date"
+		)
+	with col2:
+		to_date = st.date_input(
+			"To",
+			value=max_d.date(),
+			min_value=min_d.date(),
+			max_value=max_d.date(),
+			key="to_date"
+		)
+	# Validation for date range
+	date_error = None
+	if from_date > to_date:
+		date_error = "From date cannot be after To date"
+	elif from_date < min_d.date() or to_date > max_d.date():
+		date_error = f"Date range can only be between {min_d.date().strftime('%Y-%m-%d')} and {max_d.date().strftime('%Y-%m-%d')}"
+	elif from_date > max_d.date() or to_date < min_d.date():
+		date_error = f"Date range can only be between {min_d.date().strftime('%Y-%m-%d')} and {max_d.date().strftime('%Y-%m-%d')}"
+	if date_error:
+		st.sidebar.error(date_error)
+		# Use valid defaults when there's an error
+		from_date = min_d.date()
+		to_date = max_d.date()
+	all_categories = sorted(df["Category"].unique().tolist())
+	categories = st.sidebar.multiselect("Category", options=all_categories, default=[])
+	merchant_query = st.sidebar.text_input("Merchant search", value="", placeholder="Type a merchant name…")
+	st.sidebar.divider()
+	st.sidebar.header("AI")
+	# Default engine is now HuggingFace (not heuristic)
+	summary_mode = st.sidebar.radio("Summary", options=["Concise", "Detailed"], index=0, horizontal=True)
+	engine = st.sidebar.selectbox("Engine", options=["HuggingFace", "OpenAI", "Heuristic"], index=0)
+	ollama_model = None
+	st.sidebar.divider()
+	st.sidebar.header("Anomalies & Highlights")
+	show_spikes = st.sidebar.toggle("Show spike markers", value=True)
+	large_tx_threshold = st.sidebar.slider("Large transaction threshold (£)", 50, 1000, 250, step=25)
+	col1, col2 = st.sidebar.columns(2)
+	with col1:
+		regen = st.button("Regenerate")
+	with col2:
+		st.sidebar.write("")
+	if regen:
+		st.session_state.data = generate_synthetic_transactions(n_rows=900)
+	# Update filters
+	st.session_state.filters = {
+		"date_range": (
+			datetime.combine(from_date, datetime.min.time()),
+			datetime.combine(to_date, datetime.max.time()),
+		),
+		"categories": categories,
+		"merchant_query": merchant_query.strip(),
+		"summary_mode": summary_mode,
+		"engine": engine,
+		"ollama_model": None,
+		"show_spikes": show_spikes,
+		"large_tx_threshold": large_tx_threshold,
+	}
+def render_metrics(agg: dict):
+	col1, col2, col3, col4 = st.columns(4)
+	with col1:
+		st.markdown(f"<div class='metric-card'><div class='metric-label'>Total Value</div><div class='kpi-value'><span style='font-size: 0.8em;'>£</span><span style='font-size: 1.2em; font-weight: bold;'>{agg['total_spend']:,.0f}</span></div></div>", unsafe_allow_html=True)
+	with col2:
+		st.markdown(f"<div class='metric-card'><div class='metric-label'>Avg Monthly</div><div class='kpi-value'><span style='font-size: 0.8em;'>£</span><span style='font-size: 1.2em; font-weight: bold;'>{agg['avg_monthly_spend']:,.0f}</span></div></div>", unsafe_allow_html=True)
+	with col3:
+		st.markdown(f"<div class='metric-card'><div class='metric-label'>Max Transaction</div><div class='kpi-value kpi-accent'><span style='font-size: 0.8em;'>£</span><span style='font-size: 1.2em; font-weight: bold;'>{agg['max_transaction']['Amount']:,.0f}</span></div></div>", unsafe_allow_html=True)
+	with col4:
+		st.markdown(f"<div class='metric-card'><div class='metric-label'>Min Transaction</div><div class='kpi-value'><span style='font-size: 0.8em;'>£</span><span style='font-size: 1.2em; font-weight: bold;'>{agg['min_transaction']['Amount']:,.0f}</span></div></div>", unsafe_allow_html=True)
+def render_isa_widget(current_spend: float, allowance: float):
+	used = min(current_spend, allowance)
+	remaining = max(allowance - used, 0)
+	percent = 0 if allowance <= 0 else int((used / allowance) * 100)
+	st.markdown("<div class='isa-widget'>", unsafe_allow_html=True)
+	st.subheader("ISA allowance")
+	st.markdown(f"<div class='progress'><div style='width:{percent}%;'></div></div>", unsafe_allow_html=True)
+	col1, col2 = st.columns(2)
+	with col1:
+		st.markdown(f"<div><span class='kpi-accent' style='font-size: 1.1rem; font-weight: 600;'>USED</span><br/><span style='font-size: 1.8rem; font-weight: bold;'>£{used:,.2f}</span></div>", unsafe_allow_html=True)
+	with col2:
+		st.markdown(f"<div><span style='font-size: 1.1rem; font-weight: 600; color: rgba(255,255,255,0.8);'>REMAINING</span><br/><span style='font-size: 1.8rem; font-weight: bold;'>£{remaining:,.2f}</span></div>", unsafe_allow_html=True)
+	st.markdown("</div>", unsafe_allow_html=True)
+def render_charts(filtered_df: pd.DataFrame, agg: dict, template: str, show_spikes: bool):
+	t1, t2, t3 = st.tabs(["Trend", "By Category", "Payment Methods"])
+	with t1:
+		fig = build_time_series_chart(
+			filtered_df,
+			template=template,
+			spike_overlay=agg["spikes"] if show_spikes else None,
+		)
+		st.plotly_chart(fig, use_container_width=True)
+	with t2:
+		st.caption("Tip: Select categories in the sidebar to compare their total spend.")
+		brand_seq = ["#00AEEF", "#697089", "#005F7F", "#00CC99", "#7A7F87"]
+		fig = build_category_bar_chart(agg["spend_per_category"], template=template, color_sequence=brand_seq)
+		st.plotly_chart(fig, use_container_width=True)
+	with t3:
+		brand_seq = ["#00AEEF", "#00CC99", "#697089"]
+		fig = build_payment_method_pie_chart(agg["spend_per_payment"], template=template, color_sequence=brand_seq)
+		st.plotly_chart(fig, use_container_width=True)
+# Simple deterministic heuristic fallback (keeps behavior predictable)
+def heuristic_summary(agg: dict, mode: str) -> str:
+	# Produce a short, deterministic summary using aggregations
+	total = agg.get("total_spend", 0)
+	avg_month = agg.get("avg_monthly_spend", 0)
+	top_cat = None
+	if "spend_per_category" in agg and agg["spend_per_category"]:
+		top_cat = max(agg["spend_per_category"].items(), key=lambda x: x[1])[0]
+	spikes = agg.get("spikes", [])
+	lines = []
+	lines.append(f"Total spend in the selected period: £{total:,.2f}.")
+	lines.append(f"Average monthly spend: £{avg_month:,.2f}.")
+	if top_cat:
+		lines.append(f"Top category by spend: {top_cat}.")
+	lines.append(f"Detected {len(spikes)} spending spikes.")
+	if mode == "Detailed":
+		# Add a little more deterministic detail
+		items = list(agg.get("spend_per_category", {}).items())[:5]
+		lines.append("Spend per category: " + ", ".join(f"{k}: {chr(163)}{v:,.0f}" for k, v in items))
+	return " ".join(lines)
+def _get_hf_token() -> Optional[str]:
+	"""Return a Hugging Face token using a configurable secret name.
+	Behavior:
+	- Look up env var HF_TOKEN_NAME to get the secret key name (default 'HF_TOKEN').
+	- Prefer Streamlit secrets (st.secrets[name]) when running on Spaces.
+	- Fall back to environment variable with that name, then to HUGGINGFACE_API_KEY or HF_TOKEN.
+	"""
+	# First, allow an explicit env var to override the secret name
+	name = os.getenv("HF_TOKEN_NAME", None)
+	# If the user used the name 'streamlit' for their token, prefer that too
+	preferred_names = []
+	if name:
+		preferred_names.append(name)
+	# include the user-specified token name 'streamlit' as a high-priority fallback
+	preferred_names.append("streamlit")
+	# finally include the common default
+	preferred_names.append("HF_TOKEN")
+	try:
+		for n in preferred_names:
+			if isinstance(st.secrets, dict) and n in st.secrets:
+				return st.secrets[n]
+	except Exception:
+		pass
+	for n in preferred_names:
+		val = os.getenv(n)
+		if val:
+			return val
+	# last-resort fallbacks
+	return os.getenv("HUGGINGFACE_API_KEY") or os.getenv("HF_TOKEN")
+def _call_hf_inference(prompt: str, model: str = "tiiuae/falcon-7b-instruct", token: Optional[str] = None, max_tokens: int = 256) -> str:
+	"""Call the Hugging Face Inference API and return generated text.
+	Raises RuntimeError on non-200 responses.
+	"""
+	if not token:
+		raise RuntimeError("No Hugging Face token provided.")
+	url = f"https://api-inference.huggingface.co/models/{model}"
+	headers = {"Authorization": f"Bearer {token}"}
+	payload = {"inputs": prompt, "parameters": {"max_new_tokens": max_tokens, "temperature": 0.2}}
+	resp = requests.post(url, headers=headers, json=payload, timeout=60)
+	if resp.status_code != 200:
+		try:
+			msg = resp.json()
+		except Exception:
+			msg = resp.text
+		raise RuntimeError(f"Hugging Face inference error {resp.status_code}: {msg}")
+	data = resp.json()
+	if isinstance(data, dict):
+		if "error" in data:
+			raise RuntimeError(f"Hugging Face error: {data['error']}")
+		if "generated_text" in data:
+			return data["generated_text"]
+		for v in data.values():
+			if isinstance(v, dict) and "generated_text" in v:
+				return v["generated_text"]
+		return str(data)
+	if isinstance(data, list) and len(data) > 0:
+		if isinstance(data[0], dict) and "generated_text" in data[0]:
+			return data[0]["generated_text"]
+		return str(data[0])
+	return str(data)
+# External inference via Hugging Face API and OpenAI have been intentionally
+# removed to keep the app free to run on Hugging Face Spaces without paid APIs.
+def render_ai_summary(agg: dict, mode: str, engine: str, ollama_model: str | None):
+	st.subheader("AI Summary")
+	placeholder = st.empty()
+	placeholder.markdown(f"<div class='ai-card'>Generating summary…</div>", unsafe_allow_html=True)
+	# Build a short prompt from agg (keep it concise)
+	prompt = f"Provide a {mode.lower()} natural-language summary of these spending analytics: {json.dumps({'total_spend': agg.get('total_spend'), 'avg_monthly_spend': agg.get('avg_monthly_spend'), 'top_categories': agg.get('spend_per_category'), 'spikes': agg.get('spikes')}, default=str)}"
+	# Preferred: Hugging Face
+	if engine == "HuggingFace":
+		# Use the local summarizer which prefers a small HF model when available
+		try:
+			text = summarize_with_ai(agg, api_key=None, mode=mode, engine="HuggingFace")
+			if not text:
+				raise RuntimeError("No response from local Hugging Face summarizer.")
+			placeholder.markdown(f"<div class='ai-card'>{text}</div>", unsafe_allow_html=True)
+			return
+		except Exception as e:
+			# If local summarizer failed, try remote HF inference if a token is available
+			hf_token = _get_hf_token()
+			if hf_token:
+				try:
+					prompt = f"Provide a {mode.lower()} natural-language summary of these spending analytics: {json.dumps({'total_spend': agg.get('total_spend'), 'avg_monthly_spend': agg.get('avg_monthly_spend'), 'top_categories': agg.get('spend_per_category'), 'spikes': agg.get('spikes')}, default=str)}"
+					full_text = _call_hf_inference(prompt, model="gpt2", token=hf_token, max_tokens=256)
+					placeholder.markdown(f"<div class='ai-card'>{full_text}</div>", unsafe_allow_html=True)
+					return
+				except Exception:
+					# Fall back to heuristic if remote inference fails
+					text = heuristic_summary(agg, mode)
+					placeholder.markdown(f"<div class='ai-card'>{text}</div>", unsafe_allow_html=True)
+					return
+			else:
+				placeholder.markdown(f"<div class='ai-card'>Local summarizer error: {e}. No Hugging Face token configured; showing deterministic summary instead.</div>", unsafe_allow_html=True)
+				text = heuristic_summary(agg, mode)
+				placeholder.markdown(f"<div class='ai-card'>{text}</div>", unsafe_allow_html=True)
+				return
+	# If the user explicitly selected OpenAI, show Coming soon (we don't want to rely on paid APIs)
+	if engine == "OpenAI":
+		placeholder.markdown("<div class='ai-card'>OpenAI summaries are coming soon. Please select HuggingFace (default) or Ollama (local) instead.</div>", unsafe_allow_html=True)
+		# still provide deterministic fallback to keep UX
+		text = heuristic_summary(agg, mode)
+		placeholder.markdown(f"<div class='ai-card'>{text}</div>", unsafe_allow_html=True)
+		return
+	# Ollama support removed — local Hugging Face (distilgpt2) is the supported free option.
+	# If Heuristic selected explicitly
+	if engine == "Heuristic":
+		text = heuristic_summary(agg, mode)
+		placeholder.markdown(f"<div class='ai-card'>{text}</div>", unsafe_allow_html=True)
+		return
+	# Fallback
+	placeholder.markdown("<div class='ai-card'>Coming soon — selected engine not available.</div>", unsafe_allow_html=True)
+def main():
+	init_session_state()
+	# Inject custom CSS with hover animations (preserved exactly)
+	st.markdown("""
+	<style>
+	:root {
+		--t212: #00AEEF;
+		--t212-light: #33BFEF;
+		--t212-lighter: #66CFEF;
+	}
+	/* Base card styles */
+	.card {
+		background: rgba(0,0,0,0.25);
+		border: 1px solid rgba(255,255,255,0.08);
+		border-radius: 12px;
+		padding: 1.2rem;
+		transition: all 0.3s ease;
+		cursor: pointer;
+	}
+	.card:hover {
+		background: rgba(0,174,239,0.08);
+		border: 1px solid rgba(0,174,239,0.2);
+		transform: scale(1.02);
+		box-shadow: 0 8px 25px rgba(0,174,239,0.15);
+	}
+	/* Metric card styles with hover */
+	.metric-card {
+		background: rgba(0,0,0,0.20);
+		border-radius: 12px;
+		padding: 1.2rem;
+		border: 1px solid rgba(255,255,255,0.08);
+		transition: all 0.3s ease;
+		cursor: pointer;
+		text-align: center;
+	}
+	.metric-card:hover {
+		background: rgba(0,174,239,0.1);
+		border: 1px solid rgba(0,174,239,0.3);
+		transform: scale(1.03);
+		box-shadow: 0 10px 30px rgba(0,174,239,0.2);
+	}
+	/* AI card styles with hover */
+	.ai-card {
+		background: rgba(0, 204, 153, 0.06);
+		border-left: 4px solid #00CC99;
+		border-radius: 8px;
+		padding: 1.5rem;
+		transition: all 0.3s ease;
+		cursor: pointer;
+		font-size: 1.1rem;
+		line-height: 1.6;
+	}
+	.ai-card:hover {
+		background: rgba(0, 204, 153, 0.12);
+		border-left: 4px solid #33D9B3;
+		transform: scale(1.01);
+		box-shadow: 0 6px 20px rgba(0, 204, 153, 0.15);
+	}
+	/* ISA widget specific hover */
+	.isa-widget {
+		background: rgba(0,0,0,0.25);
+		border: 1px solid rgba(255,255,255,0.08);
+		border-radius: 12px;
+		padding: 1.5rem;
+		transition: all 0.3s ease;
+		cursor: pointer;
+	}
+	.isa-widget:hover {
+		background: rgba(0,174,239,0.08);
+		border: 1px solid rgba(0,174,239,0.2);
+		transform: scale(1.02);
+		box-shadow: 0 8px 25px rgba(0,174,239,0.15);
+	}
+	/* KPI value styles */
+	.kpi-value {
+		font-size: 2.2rem;
+		font-weight: 800;
+		margin-top: 0.5rem;
+		transition: all 0.2s ease;
+	}
+	.metric-card:hover .kpi-value {
+		color: var(--t212-light);
+	}
+	.kpi-accent {
+		color: var(--t212);
+		font-weight: 700;
+	}
+	.kpi-accent:hover {
+		color: var(--t212-lighter);
+	}
+	/* Progress bar styles */
+	.progress {
+		height: 8px;
+		background: rgba(255,255,255,0.1);
+		border-radius: 999px;
+		overflow: hidden;
+		width: 100%;
+		margin: 1rem 0;
+		transition: all 0.3s ease;
+	}
+	.progress > div {
+		height: 100%;
+		background: linear-gradient(90deg, var(--t212), var(--t212-light));
+		transition: all 0.3s ease;
+	}
+	.isa-widget:hover .progress {
+		height: 10px;
+		box-shadow: 0 2px 8px rgba(0,174,239,0.3);
+	}
+	/* Utility classes */
+	.pos { color: #1ECB4F; }
+	.neg { color: #FF4D4F; }
+	/* Enhanced text styles */
+	.metric-label {
+		font-size: 0.9rem;
+		color: rgba(255,255,255,0.7);
+		font-weight: 500;
+		margin-bottom: 0.5rem;
+	}
+	.metric-card:hover .metric-label {
+		color: rgba(255,255,255,0.9);
+	}
+	/* Subheader improvements */
+	h3 {
+		font-size: 1.4rem !important;
+		font-weight: 600 !important;
+		color: rgba(255,255,255,0.9) !important;
+		margin-bottom: 1rem !important;
+	}
+	</style>
+	""", unsafe_allow_html=True)
+	render_header()
+	render_assistant_banner()
+	# Floating chat button
+	render_chat_fab()
+	# Sidebar filters and regenerate
+	render_sidebar(st.session_state.data)
+	# Apply filters
+	filters = st.session_state.filters
+	filtered = filter_transactions(
+		st.session_state.data,
+		date_range=filters["date_range"],
+		categories=filters["categories"],
+		merchant_query=filters["merchant_query"],
+	)
+	if filtered.empty:
+		st.info("No data for selected filters. Adjust filters to see insights.")
+		return
+	agg = compute_aggregations(filtered)
+	# Top KPIs
+	st.markdown("<div class='card'>", unsafe_allow_html=True)
+	render_metrics(agg)
+	st.markdown("</div>", unsafe_allow_html=True)
+	# ISA-style allowance widget (configurable)
+	with st.expander("Allowance widget"):
+		allowance = st.number_input("Annual allowance (£)", min_value=0, value=20000, step=500)
+		render_isa_widget(current_spend=float(agg['total_spend']), allowance=float(allowance))
+	# Charts (use dark theme consistently as requested)
+	template = "plotly_dark"
+	render_charts(filtered, agg, template, show_spikes=filters["show_spikes"])
+	# AI Summary only
+	render_ai_summary(agg, mode=filters["summary_mode"], engine=filters["engine"], ollama_model=filters["ollama_model"])
+	# Large transactions table
+	threshold = filters["large_tx_threshold"]
+	large_df = filtered[filtered["Amount"] >= threshold].sort_values("Amount", ascending=False)
+	with st.expander(f"Show large transactions (≥ £{threshold}) [{len(large_df)}]"):
+		st.dataframe(large_df, use_container_width=True, hide_index=True)
+	# Downloads
+	st.divider()
+	col1, col2 = st.columns([2,1])
+	with col1:
+		st.caption("Download filtered data")
+		csv = filtered.to_csv(index=False).encode("utf-8")
+		st.download_button("Download CSV", csv, file_name="transactions_filtered.csv", mime="text/csv")
+	with col2:
+		st.caption("Dataset size")
+		st.write(f"{len(filtered):,} rows")
+if __name__ == "__main__":
+	main()

requirements.txt CHANGED Viewed

@@ -2,8 +2,9 @@ streamlit>=1.34
 pandas>=2.2
 numpy>=1.26
 plotly>=5.22
 transformers>=4.30
-altair
-pandas
-streamlit

 pandas>=2.2
 numpy>=1.26
 plotly>=5.22
+openai>=1.44
 transformers>=4.30
+torch
+transformers>=4.30
+torch

utils.py ADDED Viewed

	@@ -0,0 +1,473 @@

+from __future__ import annotations
+import math
+import os
+from dataclasses import dataclass
+from datetime import datetime, timedelta
+from typing import Dict, Iterable, List, Optional, Tuple
+import numpy as np
+import pandas as pd
+import plotly.express as px
+CATEGORIES = [
+	"Food",
+	"Travel",
+	"Shopping",
+	"Utilities",
+	"Entertainment",
+	"Health",
+	"Subscriptions",
+	"Transport",
+]
+MERCHANTS = [
+	"SuperMart",
+	"QuickEats",
+	"Urban Cafe",
+	"MegaStore",
+	"Cinema City",
+	"Fit&Fine Gym",
+	"City Utilities",
+	"StreamFlix",
+	"RideNow",
+	"Book Haven",
+	"ElectroWorld",
+	"TravelCo",
+	"PharmaPlus",
+	"HomeNeeds",
+]
+PAYMENT_METHODS = ["Debit Card", "Credit Card", "Digital Wallet"]
+LOCATIONS = [
+	"London",
+	"Manchester",
+	"Birmingham",
+	"Leeds",
+	"Glasgow",
+	"Liverpool",
+	"Bristol",
+	"Edinburgh",
+	"Cardiff",
+	"Belfast",
+]
+def _random_amounts(n: int, rng: np.random.Generator) -> np.ndarray:
+	# Mixture distribution for more realistic spend: many small, some medium, few large
+	choices = rng.choice(["small", "medium", "large"], size=n, p=[0.65, 0.28, 0.07])
+	amounts = np.empty(n)
+	for i, c in enumerate(choices):
+		if c == "small":
+			amounts[i] = max(1, rng.normal(15, 8))
+		elif c == "medium":
+			amounts[i] = max(5, rng.normal(60, 25))
+		else:
+			amounts[i] = max(20, rng.normal(180, 60))
+	# Random spikes
+	spike_idx = rng.choice(np.arange(n), size=max(1, n // 50), replace=False)
+	amounts[spike_idx] *= rng.uniform(2.5, 4.0, size=len(spike_idx))
+	return np.round(amounts, 2)
+def generate_synthetic_transactions(n_rows: int = 900, seed: Optional[int] = None) -> pd.DataFrame:
+	rng = np.random.default_rng(seed)
+	end = pd.Timestamp.today().normalize()
+	start = end - pd.Timedelta(days=365)
+	dates = pd.date_range(start, end, freq="D")
+	# Draw dates with bias to weekends and month-ends; normalize to ensure probabilities sum to 1
+	weights = np.array([
+		1.2 if d.weekday() >= 5 else 1.0 for d in dates
+	]) * np.array([
+		1.3 if d.day > 25 else 1.0 for d in dates
+	])
+	weights = np.clip(weights, a_min=0, a_max=None)
+	weights = weights / weights.sum()
+	date_choices = rng.choice(len(dates), size=n_rows, replace=True, p=weights)
+	chosen_dates = dates[date_choices]
+	categories = rng.choice(CATEGORIES, size=n_rows)
+	merchants = rng.choice(MERCHANTS, size=n_rows)
+	payment_methods = rng.choice(PAYMENT_METHODS, size=n_rows, p=[0.6, 0.25, 0.15])
+	locations = rng.choice(LOCATIONS, size=n_rows)
+	amts = _random_amounts(n_rows, rng)
+	df = pd.DataFrame(
+		{
+			"Date": pd.to_datetime(chosen_dates),
+			"Merchant": merchants,
+			"Category": categories,
+			"Amount": amts,
+			"Payment Method": payment_methods,
+			"Location": locations,
+		}
+	)
+	# Sort by date for better UX
+	df = df.sort_values("Date").reset_index(drop=True)
+	return df
+def filter_transactions(
+	df: pd.DataFrame,
+	date_range: Tuple[datetime, datetime],
+	categories: Optional[Iterable[str]] = None,
+	merchant_query: str = "",
+) -> pd.DataFrame:
+	start, end = date_range
+	mask = (df["Date"] >= pd.to_datetime(start)) & (df["Date"] <= pd.to_datetime(end))
+	if categories:
+		mask &= df["Category"].isin(list(categories))
+	if merchant_query:
+		mask &= df["Merchant"].str.contains(merchant_query, case=False, na=False)
+	return df.loc[mask].copy()
+def _month_key(s: pd.Series) -> pd.Series:
+	return pd.to_datetime(s).dt.to_period("M").dt.to_timestamp()
+def compute_aggregations(df: pd.DataFrame) -> Dict:
+	if df.empty:
+		return {
+			"total_spend": 0.0,
+			"avg_monthly_spend": 0.0,
+			"spend_per_category": pd.Series(dtype=float),
+			"spend_per_payment": pd.Series(dtype=float),
+			"max_transaction": {"Amount": 0.0},
+			"min_transaction": {"Amount": 0.0},
+			"monthly": pd.DataFrame(columns=["Month", "Amount"]),
+			"category_share": pd.Series(dtype=float),
+			"rolling_28d": pd.DataFrame(columns=["Date", "Amount", "Rolling28"]),
+			"spikes": pd.DataFrame(columns=["Date", "Amount", "IsSpike"]),
+		}
+	total_spend = float(df["Amount"].sum())
+	spend_per_category = df.groupby("Category")["Amount"].sum().sort_values(ascending=False)
+	spend_per_payment = df.groupby("Payment Method")["Amount"].sum().sort_values(ascending=False)
+	max_txn = df.loc[df["Amount"].idxmax()].to_dict()
+	min_txn = df.loc[df["Amount"].idxmin()].to_dict()
+	monthly = (
+		df.assign(Month=_month_key(df["Date"]))
+		.groupby("Month")["Amount"].sum()
+		.reset_index()
+	)
+	avg_monthly_spend = float(monthly["Amount"].mean()) if not monthly.empty else 0.0
+	# Category share
+	category_share = (spend_per_category / max(total_spend, 1e-9)).round(4)
+	# Rolling 28-day spend for simple trend smoothing
+	df_daily = df.groupby(pd.to_datetime(df["Date"]).dt.date)["Amount"].sum().reset_index()
+	df_daily["Date"] = pd.to_datetime(df_daily["Date"])  # normalize to midnight
+	df_daily = df_daily.sort_values("Date")
+	df_daily["Rolling28"] = df_daily["Amount"].rolling(window=28, min_periods=7).mean()
+	# Naive anomaly: mark spikes above mean + 2.5*std on daily amounts
+	mu = df_daily["Amount"].mean()
+	sigma = df_daily["Amount"].std(ddof=0) or 0.0
+	threshold = mu + 2.5 * sigma
+	df_spikes = df_daily.assign(IsSpike=df_daily["Amount"] > threshold)
+	return {
+		"total_spend": total_spend,
+		"avg_monthly_spend": avg_monthly_spend,
+		"spend_per_category": spend_per_category,
+		"spend_per_payment": spend_per_payment,
+		"max_transaction": max_txn,
+		"min_transaction": min_txn,
+		"monthly": monthly,
+		"category_share": category_share,
+		"rolling_28d": df_daily,
+		"spikes": df_spikes,
+	}
+def build_time_series_chart(
+	df: pd.DataFrame,
+	template: str = "plotly",
+	spike_overlay: Optional[pd.DataFrame] = None,
+) -> "px.Figure":
+	if df.empty:
+		fig = px.line()
+		fig.update_layout(template=template)
+		return fig
+	daily = df.groupby(pd.to_datetime(df["Date"]).dt.date)["Amount"].sum().reset_index()
+	daily["Date"] = pd.to_datetime(daily["Date"])  # ensure datetime for plotly
+	fig = px.line(
+		daily,
+		x="Date",
+		y="Amount",
+		title="Daily Spend Over Time",
+		markers=True,
+	)
+	fig.update_traces(hovertemplate="%{x|%b %d, %Y}: £%{y:.2f}")
+	fig.update_layout(margin=dict(l=10, r=10, t=40, b=10), template=template)
+	# Optional spike overlay
+	if isinstance(spike_overlay, pd.DataFrame) and not spike_overlay.empty:
+		spike_points = spike_overlay[spike_overlay.get("IsSpike", False)]
+		if not spike_points.empty:
+			fig.add_scatter(
+				x=spike_points["Date"],
+				y=spike_points["Amount"],
+				mode="markers",
+				name="Spikes",
+				marker=dict(color="#EF553B", size=9, symbol="diamond"),
+				hovertemplate="Spike %{x|%b %d, %Y}: £%{y:.2f}",
+			)
+	return fig
+def build_category_bar_chart(
+	spend_per_category: pd.Series,
+	template: str = "plotly",
+	color_sequence: Optional[list] = None,
+):
+	if spend_per_category.empty:
+		fig = px.bar()
+		fig.update_layout(template=template)
+		return fig
+	fig = px.bar(
+		spend_per_category.reset_index().rename(columns={"index": "Category", 0: "Amount"}),
+		x="Category",
+		y="Amount",
+		title="Spend by Category",
+		color="Category",
+		color_discrete_sequence=color_sequence,
+	)
+	fig.update_traces(hovertemplate="%{x}: £%{y:.2f}")
+	fig.update_layout(showlegend=False, margin=dict(l=10, r=10, t=40, b=10), template=template)
+	return fig
+def build_payment_method_pie_chart(
+	spend_per_payment: pd.Series,
+	template: str = "plotly",
+	color_sequence: Optional[list] = None,
+):
+	if spend_per_payment.empty:
+		fig = px.pie()
+		fig.update_layout(template=template)
+		return fig
+	fig = px.pie(
+		spend_per_payment.reset_index().rename(columns={"index": "Payment Method", 0: "Amount"}),
+		values="Amount",
+		names="Payment Method",
+		title="Payment Methods Distribution",
+		hole=0.45,
+		color_discrete_sequence=color_sequence,
+	)
+	fig.update_traces(hovertemplate="%{label}: £%{value:.2f} (%{percent})")
+	fig.update_layout(margin=dict(l=10, r=10, t=40, b=10), template=template)
+	return fig
+def _format_number(n: float) -> str:
+	if n >= 1_000_000:
+		return f"£{n/1_000_000:.1f}M"
+	if n >= 1_000:
+		return f"£{n/1_000:.1f}k"
+	return f"£{n:,.0f}"
+def summarize_with_ai(
+	agg: Dict,
+	api_key: Optional[str] = None,
+	mode: str = "Concise",
+	engine: str = "Heuristic",
+	ollama_model: Optional[str] = None,
+) -> str:
+	# Prepare a compact context
+	largest_cat = (
+		agg["spend_per_category"].idxmax() if not agg["spend_per_category"].empty else None
+	)
+	largest_cat_share = (
+		float(agg["category_share"].max()) if not agg["category_share"].empty else 0.0
+	)
+	context = {
+		"total_spend": float(agg["total_spend"]),
+		"avg_monthly": float(agg["avg_monthly_spend"]),
+		"largest_category": largest_cat,
+		"largest_category_share": largest_cat_share,
+		"max_transaction": {
+			"amount": float(agg["max_transaction"].get("Amount", 0.0)),
+			"merchant": str(agg["max_transaction"].get("Merchant", "")),
+		},
+		"mom_change": _month_over_month_change(agg.get("monthly")),
+		"spike_days": int(agg.get("spikes", pd.DataFrame()).get("IsSpike", pd.Series(dtype=bool)).sum()) if isinstance(agg.get("spikes"), pd.DataFrame) else 0,
+	}
+	# Engine selection
+	engine = (engine or "Heuristic").strip()
+	if engine == "Heuristic":
+		return _heuristic_summary(context, mode=mode)
+	# Local Hugging Face transformer model (small) - suitable for Spaces without paid APIs
+	if engine == "HuggingFace":
+		# Try to load a small, commonly-available model for generation. `distilgpt2`
+		# is a reasonable CPU-friendly option available on HF Hub and produces
+		# better text than the ultra-tiny toy models.
+		model_name = os.getenv("HF_LOCAL_MODEL", "distilgpt2")
+		try:
+			from transformers import AutoModelForCausalLM, AutoTokenizer
+			import torch
+			# load tokenizer & model (cached by huggingface inside the Space)
+			tokenizer = AutoTokenizer.from_pretrained(model_name)
+			model = AutoModelForCausalLM.from_pretrained(model_name)
+			prompt = _hf_prompt(context, mode)
+			inputs = tokenizer(prompt, return_tensors="pt")
+			with torch.no_grad():
+				out = model.generate(**inputs, max_new_tokens=128, do_sample=True, temperature=0.7)
+			text = tokenizer.decode(out[0], skip_special_tokens=True)
+			# post-process: return the generated tail after the prompt if present
+			if text.startswith(prompt):
+				return text[len(prompt):].strip() or _heuristic_summary(context, mode=mode)
+			return text.strip() or _heuristic_summary(context, mode=mode)
+		except Exception:
+			# If local HF fails, fallback to heuristic (keeps app running on Spaces)
+			return _heuristic_summary(context, mode=mode)
+	# At this point, only local Hugging Face generation and heuristic fallback are supported
+	# to keep the app free and self-contained for Hugging Face Spaces.
+	return _heuristic_summary(context, mode=mode)
+def _month_over_month_change(monthly: Optional[pd.DataFrame]) -> float:
+	if monthly is None or monthly.empty or len(monthly) < 2:
+		return 0.0
+	monthly_sorted = monthly.sort_values("Month")
+	last, prev = monthly_sorted["Amount"].iloc[-1], monthly_sorted["Amount"].iloc[-2]
+	if prev == 0:
+		return 0.0
+	return float((last - prev) / prev)
+def _heuristic_summary(ctx: Dict, mode: str = "Concise") -> str:
+	total = _format_number(ctx.get("total_spend", 0.0))
+	avg = _format_number(ctx.get("avg_monthly", 0.0))
+	lcat = ctx.get("largest_category") or "N/A"
+	share = ctx.get("largest_category_share", 0.0) * 100
+	max_amt = ctx.get("max_transaction", {}).get("amount", 0.0)
+	max_merchant = ctx.get("max_transaction", {}).get("merchant", "")
+	mom = ctx.get("mom_change", 0.0) * 100
+	spikes = ctx.get("spike_days", 0)
+	parts = [
+		f"Total spend in the selected period is {total}, averaging {avg} per month.",
+		f"Top category is {lcat} at {share:.0f}% of spend." if lcat != "N/A" else "",
+		f"Month-over-month, spending changed by {mom:+.0f}%.",
+		f"Largest single transaction was £{max_amt:,.0f} at {max_merchant}." if max_amt else "",
+		f"Detected {spikes} unusually high daily spend day(s)." if spikes else "",
+	]
+	text = " ".join([p for p in parts if p])
+	if mode == "Detailed":
+		# Add more comprehensive analysis for detailed mode
+		detailed_insights = []
+		# Spending pattern analysis
+		if mom > 10:
+			detailed_insights.append("Your spending has increased significantly this month, which may indicate lifestyle changes or seasonal variations.")
+		elif mom < -10:
+			detailed_insights.append("You've successfully reduced your spending this month, showing good financial discipline.")
+		else:
+			detailed_insights.append("Your spending patterns remain relatively stable month-over-month.")
+		# Category-specific recommendations
+		if lcat == "Food":
+			detailed_insights.append("Food represents your largest expense category. Consider meal planning and bulk shopping to optimize costs.")
+		elif lcat == "Shopping":
+			detailed_insights.append("Shopping is your primary spending category. Review purchases for necessities vs. wants to identify savings opportunities.")
+		elif lcat == "Entertainment":
+			detailed_insights.append("Entertainment spending dominates your budget. Look for free or low-cost alternatives to maintain your lifestyle within budget.")
+		# Spike analysis
+		if spikes > 5:
+			detailed_insights.append("Multiple spending spikes detected suggest irregular expense patterns. Consider smoothing these through better budgeting.")
+		elif spikes > 0:
+			detailed_insights.append("Some spending spikes were identified, which is normal but worth monitoring for budget planning.")
+		# General financial advice
+		detailed_insights.append("Consider setting category budgets and monitoring spikes to smooth cash flow and improve financial predictability.")
+		text += " " + " ".join(detailed_insights)
+	return text
+# Ollama/OpenAI helpers removed to keep the app local-only and free.
+def _hf_prompt(context: Dict, mode: str) -> str:
+	style = "concise (80-120 words)" if mode == "Concise" else "detailed (140-220 words)"
+	return (
+		"You are a helpful financial assistant. Produce a "
+		+ style
+		+ " natural-language summary of the provided spending analytics in plain English.\n\n"
+		+ f"Context: {context}\n\nSummary:"
+	)
+def chat_with_ai(
+	agg: Dict,
+	question: str,
+	engine: str = "Heuristic",
+	api_key: Optional[str] = None,
+	ollama_model: Optional[str] = None,
+) -> str:
+	# Provide compact context; reuse from summarize
+	context = {
+		"totals": float(agg.get("total_spend", 0.0)),
+		"monthly": [
+			{ "month": str(r["Month"]), "amount": float(r["Amount"]) }
+			for _, r in agg.get("monthly", pd.DataFrame()).iterrows()
+		],
+		"by_category": agg.get("spend_per_category", pd.Series(dtype=float)).to_dict(),
+		"by_payment": agg.get("spend_per_payment", pd.Series(dtype=float)).to_dict(),
+		"max_txn": agg.get("max_transaction", {}),
+	}
+	engine = (engine or "Heuristic").strip()
+	if engine == "Heuristic" or not question.strip():
+		return "Here's what I can tell from your data: total spend is " \
+			+ _format_number(context["totals"]) \
+			+ ". Ask about trends, categories, or months for more detail."
+	# Support local Hugging Face model for Q&A if requested; otherwise, return heuristic answer.
+	engine = (engine or "Heuristic").strip()
+	if engine == "Heuristic" or not question.strip():
+		return "Here's what I can tell from your data: total spend is " \
+			+ _format_number(context["totals"]) \
+			+ ". Ask about trends, categories, or months for more detail."
+	if engine == "HuggingFace":
+		model_name = os.getenv("HF_LOCAL_MODEL", "distilgpt2")
+		try:
+			from transformers import AutoModelForCausalLM, AutoTokenizer
+			import torch
+			tokenizer = AutoTokenizer.from_pretrained(model_name)
+			model = AutoModelForCausalLM.from_pretrained(model_name)
+			prompt = f"Context: {context}\n\nQuestion: {question}\nAnswer:"
+			inputs = tokenizer(prompt, return_tensors="pt")
+			with torch.no_grad():
+				out = model.generate(**inputs, max_new_tokens=128, do_sample=True, temperature=0.7)
+			text = tokenizer.decode(out[0], skip_special_tokens=True)
+			if text.startswith(prompt):
+				return text[len(prompt):].strip()
+			return text.strip()
+		except Exception:
+			return "Local model unavailable. Falling back to heuristic answer: " + (
+				"Here's what I can tell from your data: total spend is " + _format_number(context["totals"]) + "."
+			)
+	# Default fallback
+	return "I can't answer that right now. Try the Heuristic engine."