Ekow24 commited on
Commit
bf36031
·
verified ·
1 Parent(s): a11ca1f

Upload 6 files

Browse files
Files changed (6) hide show
  1. Dockerfile +9 -41
  2. README.md +44 -57
  3. __init__.py +4 -0
  4. ai_app.py +574 -0
  5. requirements.txt +4 -3
  6. utils.py +473 -0
Dockerfile CHANGED
@@ -1,52 +1,20 @@
1
- <<<<<<< HEAD
2
  FROM python:3.12-slim
3
 
 
4
  WORKDIR /app
5
 
6
- # Copy project files
7
- COPY . /app
8
-
9
- # Install build deps (kept minimal) and pip packages
10
- RUN apt-get update && apt-get install -y --no-install-recommends build-essential git && rm -rf /var/lib/apt/lists/*
11
- RUN pip install --no-cache-dir -r requirements.txt
12
-
13
- # Expose Streamlit port
14
- EXPOSE 8501
15
-
16
- # Start the Streamlit app
17
- CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]
18
- FROM python:3.12-slim
19
-
20
- WORKDIR /app
21
-
22
- # Copy code
23
- COPY . /app
24
 
25
  # Install dependencies
26
  RUN pip install --no-cache-dir -r requirements.txt
27
 
28
- EXPOSE 8501
29
-
30
- CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]
31
- =======
32
- FROM python:3.13.5-slim
33
-
34
- WORKDIR /app
35
-
36
- RUN apt-get update && apt-get install -y \
37
- build-essential \
38
- curl \
39
- git \
40
- && rm -rf /var/lib/apt/lists/*
41
-
42
- COPY requirements.txt ./
43
- COPY src/ ./src/
44
-
45
- RUN pip3 install -r requirements.txt
46
 
 
47
  EXPOSE 8501
48
 
49
- HEALTHCHECK CMD curl --fail http://localhost:8501/_stcore/health
50
-
51
- ENTRYPOINT ["streamlit", "run", "src/streamlit_app.py", "--server.port=8501", "--server.address=0.0.0.0"]
52
- >>>>>>> hf/main
 
1
+ # Use Python base image
2
  FROM python:3.12-slim
3
 
4
+ # Set working directory
5
  WORKDIR /app
6
 
7
+ # Copy dependencies
8
+ COPY requirements.txt .
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
 
10
  # Install dependencies
11
  RUN pip install --no-cache-dir -r requirements.txt
12
 
13
+ # Copy all code
14
+ COPY . .
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
+ # Expose Streamlit port
17
  EXPOSE 8501
18
 
19
+ # Run app
20
+ CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]
 
 
README.md CHANGED
@@ -1,60 +1,47 @@
1
- ---
2
- title: My Streamlit App
3
- emoji: 🚀
4
- colorFrom: blue
5
- colorTo: green
6
- sdk: streamlit
7
- sdk_version: "1.39.0"
8
- app_file: app.py
9
- pinned: false
10
- ---
11
-
12
- <<<<<<< HEAD
13
- AI Spending Analyser — Hugging Face Spaces ready
14
-
15
- This repo contains a Streamlit app that analyses synthetic spending data and provides a small local LLM-powered summary by default.
16
-
17
- Key points for deployment on Hugging Face Spaces
18
- - The app entrypoint is `app.py` at the repository root. Spaces will run `streamlit run app.py`.
19
- - Dependencies are listed in `requirements.txt` at the repo root. They include `transformers` and `torch` so a small local model can be used.
20
- - Dependencies are listed in `requirements.txt` at the repo root. To keep builds fast on Spaces, `torch` has been removed; the app uses a small local model (`distilgpt2`) by default.
21
- - Default AI engine: `HuggingFace` (local). The app attempts to load `distilgpt2` by default — this is free and runs on CPU.
22
- - OpenAI: available as a sidebar option but shows "Coming soon" in the UI because this project avoids paid APIs by default.
23
-
24
- Environment variables you can set in the Space (optional)
25
- - `HF_LOCAL_MODEL` — change the local model name (e.g., `distilgpt2`, `gpt2`, or another HF Hub model compatible with causal LM inference).
26
-
27
- Secrets (add before running remote HF inference)
28
- - Add a secret named `streamlit` in your Space Settings → Secrets and set its value to your new Hugging Face token.
29
- - Alternatively set `HF_TOKEN_NAME` to the secret key name you used.
30
-
31
- How to deploy
32
- 1. Push this repository to a git remote.
33
- 2. Create a new Hugging Face Space (Streamlit) and point it to this repo.
34
- 3. No API keys are required for the default mode. If you later enable cloud inference, add the appropriate secrets.
35
-
36
- Notes
37
- - Model download happens on first run and may take a moment.
38
- - If build times on Spaces are long because of `torch`, consider switching to a smaller model or using CPU-optimized wheels.
39
- - If build times on Spaces are long because of heavy dependencies, remove them from `requirements.txt` (we removed `torch` to speed builds).
40
- =======
41
- ---
42
- title: AI Spending Analyzer
43
- emoji: 🚀
44
- colorFrom: red
45
- colorTo: red
46
- sdk: docker
47
- app_port: 8501
48
- tags:
49
  - streamlit
50
- pinned: false
51
- short_description: Streamlit template space
52
- ---
 
 
 
 
 
 
 
 
 
53
 
54
- # Welcome to Streamlit!
55
-
56
- Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
57
 
58
- If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
59
- forums](https://discuss.streamlit.io).
60
- >>>>>>> hf/main
 
1
+ AI Spending Analyser (Streamlit)
2
+
3
+ Features
4
+ - Synthetic dataset (~900 rows) across ~1 year with realistic variability
5
+ - Filters: date range, categories, merchant query
6
+ - Metrics: total, average monthly, max/min transaction
7
+ - Charts: daily trend (line), spend by category (bar), payment methods (donut)
8
+ - AI summary: OpenAI GPT if OPENAI_API_KEY exists, else deterministic heuristic summary
9
+ - CSV download of filtered data
10
+
11
+ Quickstart
12
+ 1) Install
13
+ python -m venv .venv
14
+ . .venv/Scripts/activate # Windows PowerShell
15
+ pip install -r ai_spending_analyser/requirements.txt
16
+
17
+ 2) Run locally
18
+ streamlit run ai_spending_analyser/app.py
19
+
20
+ 3) (Optional) Enable OpenAI summaries
21
+ # PowerShell
22
+ $env:OPENAI_API_KEY = "sk-..."
23
+
24
+ Deploy to Streamlit Cloud
25
+ 1. Push this folder to a GitHub repo.
26
+ 2. On Streamlit Cloud, create a new app pointing to ai_spending_analyser/app.py.
27
+ 3. Add OPENAI_API_KEY as a secret if you want AI summaries.
28
+
29
+ Libraries
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
  - streamlit
31
+ - pandas
32
+ - numpy
33
+ - plotly
34
+ - openai (optional)
35
+ - ollama (optional; for free local LLM)
36
+
37
+ Local LLM (Ollama)
38
+ 1) Install Ollama: https://ollama.com
39
+ 2) Run Ollama and pull a small model, e.g.:
40
+ ollama pull llama3.2
41
+ 3) In the app sidebar, set Engine to "Ollama" and (optionally) model to "llama3.2".
42
+ 4) No API keys needed; runs fully offline.
43
 
44
+ Notes
45
+ - The app gracefully handles empty filters by showing an info message.
46
+ - Regenerate button synthesizes a fresh dataset.
47
 
 
 
 
__init__.py ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ # Make this directory a package so relative imports in app.py work when run as a module
2
+
3
+
4
+
ai_app.py ADDED
@@ -0,0 +1,574 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # app.py
2
+ import os
3
+ import time
4
+ import json
5
+ from datetime import datetime
6
+ from typing import Optional
7
+
8
+ import pandas as pd
9
+ import requests
10
+ import streamlit as st
11
+
12
+ # Support running as a module or script
13
+ try:
14
+ from .utils import (
15
+ generate_synthetic_transactions,
16
+ filter_transactions,
17
+ compute_aggregations,
18
+ build_time_series_chart,
19
+ build_category_bar_chart,
20
+ build_payment_method_pie_chart,
21
+ summarize_with_ai,
22
+ )
23
+ except Exception: # ImportError or relative import context issues
24
+ from utils import (
25
+ generate_synthetic_transactions,
26
+ filter_transactions,
27
+ compute_aggregations,
28
+ build_time_series_chart,
29
+ build_category_bar_chart,
30
+ build_payment_method_pie_chart,
31
+ summarize_with_ai,
32
+ )
33
+
34
+
35
+ st.set_page_config(
36
+ page_title="AI Spending Analyser",
37
+ page_icon="💳",
38
+ layout="wide",
39
+ )
40
+
41
+
42
+ def init_session_state():
43
+ if "data" not in st.session_state:
44
+ st.session_state.data = generate_synthetic_transactions(n_rows=900, seed=42)
45
+ if "filters" not in st.session_state:
46
+ min_date = st.session_state.data["Date"].min()
47
+ max_date = st.session_state.data["Date"].max()
48
+ st.session_state.filters = {
49
+ "date_range": (min_date, max_date),
50
+ "categories": [],
51
+ "merchant_query": "",
52
+ }
53
+
54
+
55
+ def render_header():
56
+ """
57
+ Render a header with a blue ^ symbol and app title.
58
+ """
59
+ st.markdown(
60
+ """
61
+ <div style='display: flex; align-items: baseline; gap: 15px; margin-bottom: 20px;'>
62
+ <div style='font-size: 80px; color: #00AEEF; font-weight: bold; line-height: 1;'>^</div>
63
+ <div style='font-size: 36px; color: #697089; font-weight: 500; line-height: 1;'>AI Spending Analyser</div>
64
+ </div>
65
+ """,
66
+ unsafe_allow_html=True,
67
+ )
68
+
69
+
70
+ def render_assistant_banner():
71
+ # Removed per request: no top assistant banner
72
+ return
73
+
74
+
75
+ def render_chat_fab():
76
+ # Removed per request: no floating chat widget
77
+ return
78
+
79
+
80
+ def render_sidebar(df: pd.DataFrame):
81
+ st.sidebar.header("Filters")
82
+ min_d = df["Date"].min()
83
+ max_d = df["Date"].max()
84
+
85
+ # Separate From and To date inputs
86
+ st.sidebar.subheader("Date Range")
87
+ col1, col2 = st.sidebar.columns(2)
88
+
89
+ with col1:
90
+ from_date = st.date_input(
91
+ "From",
92
+ value=min_d.date(),
93
+ min_value=min_d.date(),
94
+ max_value=max_d.date(),
95
+ key="from_date"
96
+ )
97
+
98
+ with col2:
99
+ to_date = st.date_input(
100
+ "To",
101
+ value=max_d.date(),
102
+ min_value=min_d.date(),
103
+ max_value=max_d.date(),
104
+ key="to_date"
105
+ )
106
+
107
+ # Validation for date range
108
+ date_error = None
109
+ if from_date > to_date:
110
+ date_error = "From date cannot be after To date"
111
+ elif from_date < min_d.date() or to_date > max_d.date():
112
+ date_error = f"Date range can only be between {min_d.date().strftime('%Y-%m-%d')} and {max_d.date().strftime('%Y-%m-%d')}"
113
+ elif from_date > max_d.date() or to_date < min_d.date():
114
+ date_error = f"Date range can only be between {min_d.date().strftime('%Y-%m-%d')} and {max_d.date().strftime('%Y-%m-%d')}"
115
+
116
+ if date_error:
117
+ st.sidebar.error(date_error)
118
+ # Use valid defaults when there's an error
119
+ from_date = min_d.date()
120
+ to_date = max_d.date()
121
+
122
+ all_categories = sorted(df["Category"].unique().tolist())
123
+ categories = st.sidebar.multiselect("Category", options=all_categories, default=[])
124
+
125
+ merchant_query = st.sidebar.text_input("Merchant search", value="", placeholder="Type a merchant name…")
126
+
127
+ st.sidebar.divider()
128
+ st.sidebar.header("AI")
129
+ # Default engine is now HuggingFace (not heuristic)
130
+ summary_mode = st.sidebar.radio("Summary", options=["Concise", "Detailed"], index=0, horizontal=True)
131
+ engine = st.sidebar.selectbox("Engine", options=["HuggingFace", "OpenAI", "Heuristic"], index=0)
132
+ ollama_model = None
133
+
134
+ st.sidebar.divider()
135
+ st.sidebar.header("Anomalies & Highlights")
136
+ show_spikes = st.sidebar.toggle("Show spike markers", value=True)
137
+ large_tx_threshold = st.sidebar.slider("Large transaction threshold (£)", 50, 1000, 250, step=25)
138
+
139
+ col1, col2 = st.sidebar.columns(2)
140
+ with col1:
141
+ regen = st.button("Regenerate")
142
+ with col2:
143
+ st.sidebar.write("")
144
+
145
+ if regen:
146
+ st.session_state.data = generate_synthetic_transactions(n_rows=900)
147
+
148
+ # Update filters
149
+ st.session_state.filters = {
150
+ "date_range": (
151
+ datetime.combine(from_date, datetime.min.time()),
152
+ datetime.combine(to_date, datetime.max.time()),
153
+ ),
154
+ "categories": categories,
155
+ "merchant_query": merchant_query.strip(),
156
+ "summary_mode": summary_mode,
157
+ "engine": engine,
158
+ "ollama_model": None,
159
+ "show_spikes": show_spikes,
160
+ "large_tx_threshold": large_tx_threshold,
161
+ }
162
+
163
+
164
+ def render_metrics(agg: dict):
165
+ col1, col2, col3, col4 = st.columns(4)
166
+ with col1:
167
+ st.markdown(f"<div class='metric-card'><div class='metric-label'>Total Value</div><div class='kpi-value'><span style='font-size: 0.8em;'>£</span><span style='font-size: 1.2em; font-weight: bold;'>{agg['total_spend']:,.0f}</span></div></div>", unsafe_allow_html=True)
168
+ with col2:
169
+ st.markdown(f"<div class='metric-card'><div class='metric-label'>Avg Monthly</div><div class='kpi-value'><span style='font-size: 0.8em;'>£</span><span style='font-size: 1.2em; font-weight: bold;'>{agg['avg_monthly_spend']:,.0f}</span></div></div>", unsafe_allow_html=True)
170
+ with col3:
171
+ st.markdown(f"<div class='metric-card'><div class='metric-label'>Max Transaction</div><div class='kpi-value kpi-accent'><span style='font-size: 0.8em;'>£</span><span style='font-size: 1.2em; font-weight: bold;'>{agg['max_transaction']['Amount']:,.0f}</span></div></div>", unsafe_allow_html=True)
172
+ with col4:
173
+ st.markdown(f"<div class='metric-card'><div class='metric-label'>Min Transaction</div><div class='kpi-value'><span style='font-size: 0.8em;'>£</span><span style='font-size: 1.2em; font-weight: bold;'>{agg['min_transaction']['Amount']:,.0f}</span></div></div>", unsafe_allow_html=True)
174
+
175
+
176
+ def render_isa_widget(current_spend: float, allowance: float):
177
+ used = min(current_spend, allowance)
178
+ remaining = max(allowance - used, 0)
179
+ percent = 0 if allowance <= 0 else int((used / allowance) * 100)
180
+ st.markdown("<div class='isa-widget'>", unsafe_allow_html=True)
181
+ st.subheader("ISA allowance")
182
+ st.markdown(f"<div class='progress'><div style='width:{percent}%;'></div></div>", unsafe_allow_html=True)
183
+ col1, col2 = st.columns(2)
184
+ with col1:
185
+ st.markdown(f"<div><span class='kpi-accent' style='font-size: 1.1rem; font-weight: 600;'>USED</span><br/><span style='font-size: 1.8rem; font-weight: bold;'>£{used:,.2f}</span></div>", unsafe_allow_html=True)
186
+ with col2:
187
+ st.markdown(f"<div><span style='font-size: 1.1rem; font-weight: 600; color: rgba(255,255,255,0.8);'>REMAINING</span><br/><span style='font-size: 1.8rem; font-weight: bold;'>£{remaining:,.2f}</span></div>", unsafe_allow_html=True)
188
+ st.markdown("</div>", unsafe_allow_html=True)
189
+
190
+
191
+ def render_charts(filtered_df: pd.DataFrame, agg: dict, template: str, show_spikes: bool):
192
+ t1, t2, t3 = st.tabs(["Trend", "By Category", "Payment Methods"])
193
+ with t1:
194
+ fig = build_time_series_chart(
195
+ filtered_df,
196
+ template=template,
197
+ spike_overlay=agg["spikes"] if show_spikes else None,
198
+ )
199
+ st.plotly_chart(fig, use_container_width=True)
200
+ with t2:
201
+ st.caption("Tip: Select categories in the sidebar to compare their total spend.")
202
+ brand_seq = ["#00AEEF", "#697089", "#005F7F", "#00CC99", "#7A7F87"]
203
+ fig = build_category_bar_chart(agg["spend_per_category"], template=template, color_sequence=brand_seq)
204
+ st.plotly_chart(fig, use_container_width=True)
205
+ with t3:
206
+ brand_seq = ["#00AEEF", "#00CC99", "#697089"]
207
+ fig = build_payment_method_pie_chart(agg["spend_per_payment"], template=template, color_sequence=brand_seq)
208
+ st.plotly_chart(fig, use_container_width=True)
209
+
210
+
211
+ # Simple deterministic heuristic fallback (keeps behavior predictable)
212
+ def heuristic_summary(agg: dict, mode: str) -> str:
213
+ # Produce a short, deterministic summary using aggregations
214
+ total = agg.get("total_spend", 0)
215
+ avg_month = agg.get("avg_monthly_spend", 0)
216
+ top_cat = None
217
+ if "spend_per_category" in agg and agg["spend_per_category"]:
218
+ top_cat = max(agg["spend_per_category"].items(), key=lambda x: x[1])[0]
219
+ spikes = agg.get("spikes", [])
220
+ lines = []
221
+ lines.append(f"Total spend in the selected period: £{total:,.2f}.")
222
+ lines.append(f"Average monthly spend: £{avg_month:,.2f}.")
223
+ if top_cat:
224
+ lines.append(f"Top category by spend: {top_cat}.")
225
+ lines.append(f"Detected {len(spikes)} spending spikes.")
226
+ if mode == "Detailed":
227
+ # Add a little more deterministic detail
228
+ items = list(agg.get("spend_per_category", {}).items())[:5]
229
+ lines.append("Spend per category: " + ", ".join(f"{k}: {chr(163)}{v:,.0f}" for k, v in items))
230
+ return " ".join(lines)
231
+
232
+
233
+ def _get_hf_token() -> Optional[str]:
234
+ """Return a Hugging Face token using a configurable secret name.
235
+
236
+ Behavior:
237
+ - Look up env var HF_TOKEN_NAME to get the secret key name (default 'HF_TOKEN').
238
+ - Prefer Streamlit secrets (st.secrets[name]) when running on Spaces.
239
+ - Fall back to environment variable with that name, then to HUGGINGFACE_API_KEY or HF_TOKEN.
240
+ """
241
+ # First, allow an explicit env var to override the secret name
242
+ name = os.getenv("HF_TOKEN_NAME", None)
243
+ # If the user used the name 'streamlit' for their token, prefer that too
244
+ preferred_names = []
245
+ if name:
246
+ preferred_names.append(name)
247
+ # include the user-specified token name 'streamlit' as a high-priority fallback
248
+ preferred_names.append("streamlit")
249
+ # finally include the common default
250
+ preferred_names.append("HF_TOKEN")
251
+
252
+ try:
253
+ for n in preferred_names:
254
+ if isinstance(st.secrets, dict) and n in st.secrets:
255
+ return st.secrets[n]
256
+ except Exception:
257
+ pass
258
+
259
+ for n in preferred_names:
260
+ val = os.getenv(n)
261
+ if val:
262
+ return val
263
+
264
+ # last-resort fallbacks
265
+ return os.getenv("HUGGINGFACE_API_KEY") or os.getenv("HF_TOKEN")
266
+
267
+
268
+ def _call_hf_inference(prompt: str, model: str = "tiiuae/falcon-7b-instruct", token: Optional[str] = None, max_tokens: int = 256) -> str:
269
+ """Call the Hugging Face Inference API and return generated text.
270
+
271
+ Raises RuntimeError on non-200 responses.
272
+ """
273
+ if not token:
274
+ raise RuntimeError("No Hugging Face token provided.")
275
+ url = f"https://api-inference.huggingface.co/models/{model}"
276
+ headers = {"Authorization": f"Bearer {token}"}
277
+ payload = {"inputs": prompt, "parameters": {"max_new_tokens": max_tokens, "temperature": 0.2}}
278
+ resp = requests.post(url, headers=headers, json=payload, timeout=60)
279
+ if resp.status_code != 200:
280
+ try:
281
+ msg = resp.json()
282
+ except Exception:
283
+ msg = resp.text
284
+ raise RuntimeError(f"Hugging Face inference error {resp.status_code}: {msg}")
285
+ data = resp.json()
286
+ if isinstance(data, dict):
287
+ if "error" in data:
288
+ raise RuntimeError(f"Hugging Face error: {data['error']}")
289
+ if "generated_text" in data:
290
+ return data["generated_text"]
291
+ for v in data.values():
292
+ if isinstance(v, dict) and "generated_text" in v:
293
+ return v["generated_text"]
294
+ return str(data)
295
+ if isinstance(data, list) and len(data) > 0:
296
+ if isinstance(data[0], dict) and "generated_text" in data[0]:
297
+ return data[0]["generated_text"]
298
+ return str(data[0])
299
+ return str(data)
300
+
301
+
302
+ # External inference via Hugging Face API and OpenAI have been intentionally
303
+ # removed to keep the app free to run on Hugging Face Spaces without paid APIs.
304
+
305
+
306
+ def render_ai_summary(agg: dict, mode: str, engine: str, ollama_model: str | None):
307
+ st.subheader("AI Summary")
308
+ placeholder = st.empty()
309
+ placeholder.markdown(f"<div class='ai-card'>Generating summary…</div>", unsafe_allow_html=True)
310
+
311
+ # Build a short prompt from agg (keep it concise)
312
+ prompt = f"Provide a {mode.lower()} natural-language summary of these spending analytics: {json.dumps({'total_spend': agg.get('total_spend'), 'avg_monthly_spend': agg.get('avg_monthly_spend'), 'top_categories': agg.get('spend_per_category'), 'spikes': agg.get('spikes')}, default=str)}"
313
+
314
+ # Preferred: Hugging Face
315
+ if engine == "HuggingFace":
316
+ # Use the local summarizer which prefers a small HF model when available
317
+ try:
318
+ text = summarize_with_ai(agg, api_key=None, mode=mode, engine="HuggingFace")
319
+ if not text:
320
+ raise RuntimeError("No response from local Hugging Face summarizer.")
321
+ placeholder.markdown(f"<div class='ai-card'>{text}</div>", unsafe_allow_html=True)
322
+ return
323
+ except Exception as e:
324
+ # If local summarizer failed, try remote HF inference if a token is available
325
+ hf_token = _get_hf_token()
326
+ if hf_token:
327
+ try:
328
+ prompt = f"Provide a {mode.lower()} natural-language summary of these spending analytics: {json.dumps({'total_spend': agg.get('total_spend'), 'avg_monthly_spend': agg.get('avg_monthly_spend'), 'top_categories': agg.get('spend_per_category'), 'spikes': agg.get('spikes')}, default=str)}"
329
+ full_text = _call_hf_inference(prompt, model="gpt2", token=hf_token, max_tokens=256)
330
+ placeholder.markdown(f"<div class='ai-card'>{full_text}</div>", unsafe_allow_html=True)
331
+ return
332
+ except Exception:
333
+ # Fall back to heuristic if remote inference fails
334
+ text = heuristic_summary(agg, mode)
335
+ placeholder.markdown(f"<div class='ai-card'>{text}</div>", unsafe_allow_html=True)
336
+ return
337
+ else:
338
+ placeholder.markdown(f"<div class='ai-card'>Local summarizer error: {e}. No Hugging Face token configured; showing deterministic summary instead.</div>", unsafe_allow_html=True)
339
+ text = heuristic_summary(agg, mode)
340
+ placeholder.markdown(f"<div class='ai-card'>{text}</div>", unsafe_allow_html=True)
341
+ return
342
+
343
+ # If the user explicitly selected OpenAI, show Coming soon (we don't want to rely on paid APIs)
344
+ if engine == "OpenAI":
345
+ placeholder.markdown("<div class='ai-card'>OpenAI summaries are coming soon. Please select HuggingFace (default) or Ollama (local) instead.</div>", unsafe_allow_html=True)
346
+ # still provide deterministic fallback to keep UX
347
+ text = heuristic_summary(agg, mode)
348
+ placeholder.markdown(f"<div class='ai-card'>{text}</div>", unsafe_allow_html=True)
349
+ return
350
+
351
+ # Ollama support removed — local Hugging Face (distilgpt2) is the supported free option.
352
+
353
+ # If Heuristic selected explicitly
354
+ if engine == "Heuristic":
355
+ text = heuristic_summary(agg, mode)
356
+ placeholder.markdown(f"<div class='ai-card'>{text}</div>", unsafe_allow_html=True)
357
+ return
358
+
359
+ # Fallback
360
+ placeholder.markdown("<div class='ai-card'>Coming soon — selected engine not available.</div>", unsafe_allow_html=True)
361
+
362
+
363
+ def main():
364
+ init_session_state()
365
+
366
+ # Inject custom CSS with hover animations (preserved exactly)
367
+ st.markdown("""
368
+ <style>
369
+ :root {
370
+ --t212: #00AEEF;
371
+ --t212-light: #33BFEF;
372
+ --t212-lighter: #66CFEF;
373
+ }
374
+
375
+ /* Base card styles */
376
+ .card {
377
+ background: rgba(0,0,0,0.25);
378
+ border: 1px solid rgba(255,255,255,0.08);
379
+ border-radius: 12px;
380
+ padding: 1.2rem;
381
+ transition: all 0.3s ease;
382
+ cursor: pointer;
383
+ }
384
+
385
+ .card:hover {
386
+ background: rgba(0,174,239,0.08);
387
+ border: 1px solid rgba(0,174,239,0.2);
388
+ transform: scale(1.02);
389
+ box-shadow: 0 8px 25px rgba(0,174,239,0.15);
390
+ }
391
+
392
+ /* Metric card styles with hover */
393
+ .metric-card {
394
+ background: rgba(0,0,0,0.20);
395
+ border-radius: 12px;
396
+ padding: 1.2rem;
397
+ border: 1px solid rgba(255,255,255,0.08);
398
+ transition: all 0.3s ease;
399
+ cursor: pointer;
400
+ text-align: center;
401
+ }
402
+
403
+ .metric-card:hover {
404
+ background: rgba(0,174,239,0.1);
405
+ border: 1px solid rgba(0,174,239,0.3);
406
+ transform: scale(1.03);
407
+ box-shadow: 0 10px 30px rgba(0,174,239,0.2);
408
+ }
409
+
410
+ /* AI card styles with hover */
411
+ .ai-card {
412
+ background: rgba(0, 204, 153, 0.06);
413
+ border-left: 4px solid #00CC99;
414
+ border-radius: 8px;
415
+ padding: 1.5rem;
416
+ transition: all 0.3s ease;
417
+ cursor: pointer;
418
+ font-size: 1.1rem;
419
+ line-height: 1.6;
420
+ }
421
+
422
+ .ai-card:hover {
423
+ background: rgba(0, 204, 153, 0.12);
424
+ border-left: 4px solid #33D9B3;
425
+ transform: scale(1.01);
426
+ box-shadow: 0 6px 20px rgba(0, 204, 153, 0.15);
427
+ }
428
+
429
+ /* ISA widget specific hover */
430
+ .isa-widget {
431
+ background: rgba(0,0,0,0.25);
432
+ border: 1px solid rgba(255,255,255,0.08);
433
+ border-radius: 12px;
434
+ padding: 1.5rem;
435
+ transition: all 0.3s ease;
436
+ cursor: pointer;
437
+ }
438
+
439
+ .isa-widget:hover {
440
+ background: rgba(0,174,239,0.08);
441
+ border: 1px solid rgba(0,174,239,0.2);
442
+ transform: scale(1.02);
443
+ box-shadow: 0 8px 25px rgba(0,174,239,0.15);
444
+ }
445
+
446
+ /* KPI value styles */
447
+ .kpi-value {
448
+ font-size: 2.2rem;
449
+ font-weight: 800;
450
+ margin-top: 0.5rem;
451
+ transition: all 0.2s ease;
452
+ }
453
+
454
+ .metric-card:hover .kpi-value {
455
+ color: var(--t212-light);
456
+ }
457
+
458
+ .kpi-accent {
459
+ color: var(--t212);
460
+ font-weight: 700;
461
+ }
462
+
463
+ .kpi-accent:hover {
464
+ color: var(--t212-lighter);
465
+ }
466
+
467
+ /* Progress bar styles */
468
+ .progress {
469
+ height: 8px;
470
+ background: rgba(255,255,255,0.1);
471
+ border-radius: 999px;
472
+ overflow: hidden;
473
+ width: 100%;
474
+ margin: 1rem 0;
475
+ transition: all 0.3s ease;
476
+ }
477
+
478
+ .progress > div {
479
+ height: 100%;
480
+ background: linear-gradient(90deg, var(--t212), var(--t212-light));
481
+ transition: all 0.3s ease;
482
+ }
483
+
484
+ .isa-widget:hover .progress {
485
+ height: 10px;
486
+ box-shadow: 0 2px 8px rgba(0,174,239,0.3);
487
+ }
488
+
489
+ /* Utility classes */
490
+ .pos { color: #1ECB4F; }
491
+ .neg { color: #FF4D4F; }
492
+
493
+ /* Enhanced text styles */
494
+ .metric-label {
495
+ font-size: 0.9rem;
496
+ color: rgba(255,255,255,0.7);
497
+ font-weight: 500;
498
+ margin-bottom: 0.5rem;
499
+ }
500
+
501
+ .metric-card:hover .metric-label {
502
+ color: rgba(255,255,255,0.9);
503
+ }
504
+
505
+ /* Subheader improvements */
506
+ h3 {
507
+ font-size: 1.4rem !important;
508
+ font-weight: 600 !important;
509
+ color: rgba(255,255,255,0.9) !important;
510
+ margin-bottom: 1rem !important;
511
+ }
512
+ </style>
513
+ """, unsafe_allow_html=True)
514
+ render_header()
515
+ render_assistant_banner()
516
+
517
+ # Floating chat button
518
+ render_chat_fab()
519
+
520
+ # Sidebar filters and regenerate
521
+ render_sidebar(st.session_state.data)
522
+
523
+ # Apply filters
524
+ filters = st.session_state.filters
525
+ filtered = filter_transactions(
526
+ st.session_state.data,
527
+ date_range=filters["date_range"],
528
+ categories=filters["categories"],
529
+ merchant_query=filters["merchant_query"],
530
+ )
531
+
532
+ if filtered.empty:
533
+ st.info("No data for selected filters. Adjust filters to see insights.")
534
+ return
535
+
536
+ agg = compute_aggregations(filtered)
537
+
538
+ # Top KPIs
539
+ st.markdown("<div class='card'>", unsafe_allow_html=True)
540
+ render_metrics(agg)
541
+ st.markdown("</div>", unsafe_allow_html=True)
542
+
543
+ # ISA-style allowance widget (configurable)
544
+ with st.expander("Allowance widget"):
545
+ allowance = st.number_input("Annual allowance (£)", min_value=0, value=20000, step=500)
546
+ render_isa_widget(current_spend=float(agg['total_spend']), allowance=float(allowance))
547
+
548
+ # Charts (use dark theme consistently as requested)
549
+ template = "plotly_dark"
550
+ render_charts(filtered, agg, template, show_spikes=filters["show_spikes"])
551
+
552
+ # AI Summary only
553
+ render_ai_summary(agg, mode=filters["summary_mode"], engine=filters["engine"], ollama_model=filters["ollama_model"])
554
+
555
+ # Large transactions table
556
+ threshold = filters["large_tx_threshold"]
557
+ large_df = filtered[filtered["Amount"] >= threshold].sort_values("Amount", ascending=False)
558
+ with st.expander(f"Show large transactions (≥ £{threshold}) [{len(large_df)}]"):
559
+ st.dataframe(large_df, use_container_width=True, hide_index=True)
560
+
561
+ # Downloads
562
+ st.divider()
563
+ col1, col2 = st.columns([2,1])
564
+ with col1:
565
+ st.caption("Download filtered data")
566
+ csv = filtered.to_csv(index=False).encode("utf-8")
567
+ st.download_button("Download CSV", csv, file_name="transactions_filtered.csv", mime="text/csv")
568
+ with col2:
569
+ st.caption("Dataset size")
570
+ st.write(f"{len(filtered):,} rows")
571
+
572
+
573
+ if __name__ == "__main__":
574
+ main()
requirements.txt CHANGED
@@ -2,8 +2,9 @@ streamlit>=1.34
2
  pandas>=2.2
3
  numpy>=1.26
4
  plotly>=5.22
 
5
  transformers>=4.30
 
 
 
6
 
7
- altair
8
- pandas
9
- streamlit
 
2
  pandas>=2.2
3
  numpy>=1.26
4
  plotly>=5.22
5
+ openai>=1.44
6
  transformers>=4.30
7
+ torch
8
+ transformers>=4.30
9
+ torch
10
 
 
 
 
utils.py ADDED
@@ -0,0 +1,473 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import math
4
+ import os
5
+ from dataclasses import dataclass
6
+ from datetime import datetime, timedelta
7
+ from typing import Dict, Iterable, List, Optional, Tuple
8
+
9
+ import numpy as np
10
+ import pandas as pd
11
+ import plotly.express as px
12
+
13
+
14
+ CATEGORIES = [
15
+ "Food",
16
+ "Travel",
17
+ "Shopping",
18
+ "Utilities",
19
+ "Entertainment",
20
+ "Health",
21
+ "Subscriptions",
22
+ "Transport",
23
+ ]
24
+
25
+ MERCHANTS = [
26
+ "SuperMart",
27
+ "QuickEats",
28
+ "Urban Cafe",
29
+ "MegaStore",
30
+ "Cinema City",
31
+ "Fit&Fine Gym",
32
+ "City Utilities",
33
+ "StreamFlix",
34
+ "RideNow",
35
+ "Book Haven",
36
+ "ElectroWorld",
37
+ "TravelCo",
38
+ "PharmaPlus",
39
+ "HomeNeeds",
40
+ ]
41
+
42
+ PAYMENT_METHODS = ["Debit Card", "Credit Card", "Digital Wallet"]
43
+
44
+ LOCATIONS = [
45
+ "London",
46
+ "Manchester",
47
+ "Birmingham",
48
+ "Leeds",
49
+ "Glasgow",
50
+ "Liverpool",
51
+ "Bristol",
52
+ "Edinburgh",
53
+ "Cardiff",
54
+ "Belfast",
55
+ ]
56
+
57
+
58
+ def _random_amounts(n: int, rng: np.random.Generator) -> np.ndarray:
59
+ # Mixture distribution for more realistic spend: many small, some medium, few large
60
+ choices = rng.choice(["small", "medium", "large"], size=n, p=[0.65, 0.28, 0.07])
61
+ amounts = np.empty(n)
62
+ for i, c in enumerate(choices):
63
+ if c == "small":
64
+ amounts[i] = max(1, rng.normal(15, 8))
65
+ elif c == "medium":
66
+ amounts[i] = max(5, rng.normal(60, 25))
67
+ else:
68
+ amounts[i] = max(20, rng.normal(180, 60))
69
+ # Random spikes
70
+ spike_idx = rng.choice(np.arange(n), size=max(1, n // 50), replace=False)
71
+ amounts[spike_idx] *= rng.uniform(2.5, 4.0, size=len(spike_idx))
72
+ return np.round(amounts, 2)
73
+
74
+
75
+ def generate_synthetic_transactions(n_rows: int = 900, seed: Optional[int] = None) -> pd.DataFrame:
76
+ rng = np.random.default_rng(seed)
77
+ end = pd.Timestamp.today().normalize()
78
+ start = end - pd.Timedelta(days=365)
79
+ dates = pd.date_range(start, end, freq="D")
80
+
81
+ # Draw dates with bias to weekends and month-ends; normalize to ensure probabilities sum to 1
82
+ weights = np.array([
83
+ 1.2 if d.weekday() >= 5 else 1.0 for d in dates
84
+ ]) * np.array([
85
+ 1.3 if d.day > 25 else 1.0 for d in dates
86
+ ])
87
+ weights = np.clip(weights, a_min=0, a_max=None)
88
+ weights = weights / weights.sum()
89
+ date_choices = rng.choice(len(dates), size=n_rows, replace=True, p=weights)
90
+ chosen_dates = dates[date_choices]
91
+
92
+ categories = rng.choice(CATEGORIES, size=n_rows)
93
+ merchants = rng.choice(MERCHANTS, size=n_rows)
94
+ payment_methods = rng.choice(PAYMENT_METHODS, size=n_rows, p=[0.6, 0.25, 0.15])
95
+ locations = rng.choice(LOCATIONS, size=n_rows)
96
+ amts = _random_amounts(n_rows, rng)
97
+
98
+ df = pd.DataFrame(
99
+ {
100
+ "Date": pd.to_datetime(chosen_dates),
101
+ "Merchant": merchants,
102
+ "Category": categories,
103
+ "Amount": amts,
104
+ "Payment Method": payment_methods,
105
+ "Location": locations,
106
+ }
107
+ )
108
+ # Sort by date for better UX
109
+ df = df.sort_values("Date").reset_index(drop=True)
110
+ return df
111
+
112
+
113
+ def filter_transactions(
114
+ df: pd.DataFrame,
115
+ date_range: Tuple[datetime, datetime],
116
+ categories: Optional[Iterable[str]] = None,
117
+ merchant_query: str = "",
118
+ ) -> pd.DataFrame:
119
+ start, end = date_range
120
+ mask = (df["Date"] >= pd.to_datetime(start)) & (df["Date"] <= pd.to_datetime(end))
121
+ if categories:
122
+ mask &= df["Category"].isin(list(categories))
123
+ if merchant_query:
124
+ mask &= df["Merchant"].str.contains(merchant_query, case=False, na=False)
125
+ return df.loc[mask].copy()
126
+
127
+
128
+ def _month_key(s: pd.Series) -> pd.Series:
129
+ return pd.to_datetime(s).dt.to_period("M").dt.to_timestamp()
130
+
131
+
132
+ def compute_aggregations(df: pd.DataFrame) -> Dict:
133
+ if df.empty:
134
+ return {
135
+ "total_spend": 0.0,
136
+ "avg_monthly_spend": 0.0,
137
+ "spend_per_category": pd.Series(dtype=float),
138
+ "spend_per_payment": pd.Series(dtype=float),
139
+ "max_transaction": {"Amount": 0.0},
140
+ "min_transaction": {"Amount": 0.0},
141
+ "monthly": pd.DataFrame(columns=["Month", "Amount"]),
142
+ "category_share": pd.Series(dtype=float),
143
+ "rolling_28d": pd.DataFrame(columns=["Date", "Amount", "Rolling28"]),
144
+ "spikes": pd.DataFrame(columns=["Date", "Amount", "IsSpike"]),
145
+ }
146
+
147
+ total_spend = float(df["Amount"].sum())
148
+ spend_per_category = df.groupby("Category")["Amount"].sum().sort_values(ascending=False)
149
+ spend_per_payment = df.groupby("Payment Method")["Amount"].sum().sort_values(ascending=False)
150
+ max_txn = df.loc[df["Amount"].idxmax()].to_dict()
151
+ min_txn = df.loc[df["Amount"].idxmin()].to_dict()
152
+
153
+ monthly = (
154
+ df.assign(Month=_month_key(df["Date"]))
155
+ .groupby("Month")["Amount"].sum()
156
+ .reset_index()
157
+ )
158
+ avg_monthly_spend = float(monthly["Amount"].mean()) if not monthly.empty else 0.0
159
+
160
+ # Category share
161
+ category_share = (spend_per_category / max(total_spend, 1e-9)).round(4)
162
+
163
+ # Rolling 28-day spend for simple trend smoothing
164
+ df_daily = df.groupby(pd.to_datetime(df["Date"]).dt.date)["Amount"].sum().reset_index()
165
+ df_daily["Date"] = pd.to_datetime(df_daily["Date"]) # normalize to midnight
166
+ df_daily = df_daily.sort_values("Date")
167
+ df_daily["Rolling28"] = df_daily["Amount"].rolling(window=28, min_periods=7).mean()
168
+
169
+ # Naive anomaly: mark spikes above mean + 2.5*std on daily amounts
170
+ mu = df_daily["Amount"].mean()
171
+ sigma = df_daily["Amount"].std(ddof=0) or 0.0
172
+ threshold = mu + 2.5 * sigma
173
+ df_spikes = df_daily.assign(IsSpike=df_daily["Amount"] > threshold)
174
+
175
+ return {
176
+ "total_spend": total_spend,
177
+ "avg_monthly_spend": avg_monthly_spend,
178
+ "spend_per_category": spend_per_category,
179
+ "spend_per_payment": spend_per_payment,
180
+ "max_transaction": max_txn,
181
+ "min_transaction": min_txn,
182
+ "monthly": monthly,
183
+ "category_share": category_share,
184
+ "rolling_28d": df_daily,
185
+ "spikes": df_spikes,
186
+ }
187
+
188
+
189
+ def build_time_series_chart(
190
+ df: pd.DataFrame,
191
+ template: str = "plotly",
192
+ spike_overlay: Optional[pd.DataFrame] = None,
193
+ ) -> "px.Figure":
194
+ if df.empty:
195
+ fig = px.line()
196
+ fig.update_layout(template=template)
197
+ return fig
198
+ daily = df.groupby(pd.to_datetime(df["Date"]).dt.date)["Amount"].sum().reset_index()
199
+ daily["Date"] = pd.to_datetime(daily["Date"]) # ensure datetime for plotly
200
+ fig = px.line(
201
+ daily,
202
+ x="Date",
203
+ y="Amount",
204
+ title="Daily Spend Over Time",
205
+ markers=True,
206
+ )
207
+ fig.update_traces(hovertemplate="%{x|%b %d, %Y}: £%{y:.2f}")
208
+ fig.update_layout(margin=dict(l=10, r=10, t=40, b=10), template=template)
209
+
210
+ # Optional spike overlay
211
+ if isinstance(spike_overlay, pd.DataFrame) and not spike_overlay.empty:
212
+ spike_points = spike_overlay[spike_overlay.get("IsSpike", False)]
213
+ if not spike_points.empty:
214
+ fig.add_scatter(
215
+ x=spike_points["Date"],
216
+ y=spike_points["Amount"],
217
+ mode="markers",
218
+ name="Spikes",
219
+ marker=dict(color="#EF553B", size=9, symbol="diamond"),
220
+ hovertemplate="Spike %{x|%b %d, %Y}: £%{y:.2f}",
221
+ )
222
+ return fig
223
+
224
+
225
+
226
+ def build_category_bar_chart(
227
+ spend_per_category: pd.Series,
228
+ template: str = "plotly",
229
+ color_sequence: Optional[list] = None,
230
+ ):
231
+ if spend_per_category.empty:
232
+ fig = px.bar()
233
+ fig.update_layout(template=template)
234
+ return fig
235
+ fig = px.bar(
236
+ spend_per_category.reset_index().rename(columns={"index": "Category", 0: "Amount"}),
237
+ x="Category",
238
+ y="Amount",
239
+ title="Spend by Category",
240
+ color="Category",
241
+ color_discrete_sequence=color_sequence,
242
+ )
243
+ fig.update_traces(hovertemplate="%{x}: £%{y:.2f}")
244
+ fig.update_layout(showlegend=False, margin=dict(l=10, r=10, t=40, b=10), template=template)
245
+ return fig
246
+
247
+
248
+
249
+ def build_payment_method_pie_chart(
250
+ spend_per_payment: pd.Series,
251
+ template: str = "plotly",
252
+ color_sequence: Optional[list] = None,
253
+ ):
254
+ if spend_per_payment.empty:
255
+ fig = px.pie()
256
+ fig.update_layout(template=template)
257
+ return fig
258
+ fig = px.pie(
259
+ spend_per_payment.reset_index().rename(columns={"index": "Payment Method", 0: "Amount"}),
260
+ values="Amount",
261
+ names="Payment Method",
262
+ title="Payment Methods Distribution",
263
+ hole=0.45,
264
+ color_discrete_sequence=color_sequence,
265
+ )
266
+ fig.update_traces(hovertemplate="%{label}: £%{value:.2f} (%{percent})")
267
+ fig.update_layout(margin=dict(l=10, r=10, t=40, b=10), template=template)
268
+ return fig
269
+
270
+
271
+ def _format_number(n: float) -> str:
272
+ if n >= 1_000_000:
273
+ return f"£{n/1_000_000:.1f}M"
274
+ if n >= 1_000:
275
+ return f"£{n/1_000:.1f}k"
276
+ return f"£{n:,.0f}"
277
+
278
+
279
+ def summarize_with_ai(
280
+ agg: Dict,
281
+ api_key: Optional[str] = None,
282
+ mode: str = "Concise",
283
+ engine: str = "Heuristic",
284
+ ollama_model: Optional[str] = None,
285
+ ) -> str:
286
+ # Prepare a compact context
287
+ largest_cat = (
288
+ agg["spend_per_category"].idxmax() if not agg["spend_per_category"].empty else None
289
+ )
290
+ largest_cat_share = (
291
+ float(agg["category_share"].max()) if not agg["category_share"].empty else 0.0
292
+ )
293
+
294
+ context = {
295
+ "total_spend": float(agg["total_spend"]),
296
+ "avg_monthly": float(agg["avg_monthly_spend"]),
297
+ "largest_category": largest_cat,
298
+ "largest_category_share": largest_cat_share,
299
+ "max_transaction": {
300
+ "amount": float(agg["max_transaction"].get("Amount", 0.0)),
301
+ "merchant": str(agg["max_transaction"].get("Merchant", "")),
302
+ },
303
+ "mom_change": _month_over_month_change(agg.get("monthly")),
304
+ "spike_days": int(agg.get("spikes", pd.DataFrame()).get("IsSpike", pd.Series(dtype=bool)).sum()) if isinstance(agg.get("spikes"), pd.DataFrame) else 0,
305
+ }
306
+
307
+ # Engine selection
308
+ engine = (engine or "Heuristic").strip()
309
+ if engine == "Heuristic":
310
+ return _heuristic_summary(context, mode=mode)
311
+
312
+ # Local Hugging Face transformer model (small) - suitable for Spaces without paid APIs
313
+ if engine == "HuggingFace":
314
+ # Try to load a small, commonly-available model for generation. `distilgpt2`
315
+ # is a reasonable CPU-friendly option available on HF Hub and produces
316
+ # better text than the ultra-tiny toy models.
317
+ model_name = os.getenv("HF_LOCAL_MODEL", "distilgpt2")
318
+ try:
319
+ from transformers import AutoModelForCausalLM, AutoTokenizer
320
+ import torch
321
+ # load tokenizer & model (cached by huggingface inside the Space)
322
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
323
+ model = AutoModelForCausalLM.from_pretrained(model_name)
324
+ prompt = _hf_prompt(context, mode)
325
+ inputs = tokenizer(prompt, return_tensors="pt")
326
+ with torch.no_grad():
327
+ out = model.generate(**inputs, max_new_tokens=128, do_sample=True, temperature=0.7)
328
+ text = tokenizer.decode(out[0], skip_special_tokens=True)
329
+ # post-process: return the generated tail after the prompt if present
330
+ if text.startswith(prompt):
331
+ return text[len(prompt):].strip() or _heuristic_summary(context, mode=mode)
332
+ return text.strip() or _heuristic_summary(context, mode=mode)
333
+ except Exception:
334
+ # If local HF fails, fallback to heuristic (keeps app running on Spaces)
335
+ return _heuristic_summary(context, mode=mode)
336
+
337
+ # At this point, only local Hugging Face generation and heuristic fallback are supported
338
+ # to keep the app free and self-contained for Hugging Face Spaces.
339
+ return _heuristic_summary(context, mode=mode)
340
+
341
+
342
+ def _month_over_month_change(monthly: Optional[pd.DataFrame]) -> float:
343
+ if monthly is None or monthly.empty or len(monthly) < 2:
344
+ return 0.0
345
+ monthly_sorted = monthly.sort_values("Month")
346
+ last, prev = monthly_sorted["Amount"].iloc[-1], monthly_sorted["Amount"].iloc[-2]
347
+ if prev == 0:
348
+ return 0.0
349
+ return float((last - prev) / prev)
350
+
351
+
352
+ def _heuristic_summary(ctx: Dict, mode: str = "Concise") -> str:
353
+ total = _format_number(ctx.get("total_spend", 0.0))
354
+ avg = _format_number(ctx.get("avg_monthly", 0.0))
355
+ lcat = ctx.get("largest_category") or "N/A"
356
+ share = ctx.get("largest_category_share", 0.0) * 100
357
+ max_amt = ctx.get("max_transaction", {}).get("amount", 0.0)
358
+ max_merchant = ctx.get("max_transaction", {}).get("merchant", "")
359
+ mom = ctx.get("mom_change", 0.0) * 100
360
+ spikes = ctx.get("spike_days", 0)
361
+
362
+ parts = [
363
+ f"Total spend in the selected period is {total}, averaging {avg} per month.",
364
+ f"Top category is {lcat} at {share:.0f}% of spend." if lcat != "N/A" else "",
365
+ f"Month-over-month, spending changed by {mom:+.0f}%.",
366
+ f"Largest single transaction was £{max_amt:,.0f} at {max_merchant}." if max_amt else "",
367
+ f"Detected {spikes} unusually high daily spend day(s)." if spikes else "",
368
+ ]
369
+ text = " ".join([p for p in parts if p])
370
+
371
+ if mode == "Detailed":
372
+ # Add more comprehensive analysis for detailed mode
373
+ detailed_insights = []
374
+
375
+ # Spending pattern analysis
376
+ if mom > 10:
377
+ detailed_insights.append("Your spending has increased significantly this month, which may indicate lifestyle changes or seasonal variations.")
378
+ elif mom < -10:
379
+ detailed_insights.append("You've successfully reduced your spending this month, showing good financial discipline.")
380
+ else:
381
+ detailed_insights.append("Your spending patterns remain relatively stable month-over-month.")
382
+
383
+ # Category-specific recommendations
384
+ if lcat == "Food":
385
+ detailed_insights.append("Food represents your largest expense category. Consider meal planning and bulk shopping to optimize costs.")
386
+ elif lcat == "Shopping":
387
+ detailed_insights.append("Shopping is your primary spending category. Review purchases for necessities vs. wants to identify savings opportunities.")
388
+ elif lcat == "Entertainment":
389
+ detailed_insights.append("Entertainment spending dominates your budget. Look for free or low-cost alternatives to maintain your lifestyle within budget.")
390
+
391
+ # Spike analysis
392
+ if spikes > 5:
393
+ detailed_insights.append("Multiple spending spikes detected suggest irregular expense patterns. Consider smoothing these through better budgeting.")
394
+ elif spikes > 0:
395
+ detailed_insights.append("Some spending spikes were identified, which is normal but worth monitoring for budget planning.")
396
+
397
+ # General financial advice
398
+ detailed_insights.append("Consider setting category budgets and monitoring spikes to smooth cash flow and improve financial predictability.")
399
+
400
+ text += " " + " ".join(detailed_insights)
401
+
402
+ return text
403
+
404
+
405
+ # Ollama/OpenAI helpers removed to keep the app local-only and free.
406
+
407
+
408
+ def _hf_prompt(context: Dict, mode: str) -> str:
409
+ style = "concise (80-120 words)" if mode == "Concise" else "detailed (140-220 words)"
410
+ return (
411
+ "You are a helpful financial assistant. Produce a "
412
+ + style
413
+ + " natural-language summary of the provided spending analytics in plain English.\n\n"
414
+ + f"Context: {context}\n\nSummary:"
415
+ )
416
+
417
+
418
+ def chat_with_ai(
419
+ agg: Dict,
420
+ question: str,
421
+ engine: str = "Heuristic",
422
+ api_key: Optional[str] = None,
423
+ ollama_model: Optional[str] = None,
424
+ ) -> str:
425
+ # Provide compact context; reuse from summarize
426
+ context = {
427
+ "totals": float(agg.get("total_spend", 0.0)),
428
+ "monthly": [
429
+ { "month": str(r["Month"]), "amount": float(r["Amount"]) }
430
+ for _, r in agg.get("monthly", pd.DataFrame()).iterrows()
431
+ ],
432
+ "by_category": agg.get("spend_per_category", pd.Series(dtype=float)).to_dict(),
433
+ "by_payment": agg.get("spend_per_payment", pd.Series(dtype=float)).to_dict(),
434
+ "max_txn": agg.get("max_transaction", {}),
435
+ }
436
+
437
+ engine = (engine or "Heuristic").strip()
438
+ if engine == "Heuristic" or not question.strip():
439
+ return "Here's what I can tell from your data: total spend is " \
440
+ + _format_number(context["totals"]) \
441
+ + ". Ask about trends, categories, or months for more detail."
442
+
443
+ # Support local Hugging Face model for Q&A if requested; otherwise, return heuristic answer.
444
+ engine = (engine or "Heuristic").strip()
445
+ if engine == "Heuristic" or not question.strip():
446
+ return "Here's what I can tell from your data: total spend is " \
447
+ + _format_number(context["totals"]) \
448
+ + ". Ask about trends, categories, or months for more detail."
449
+
450
+ if engine == "HuggingFace":
451
+ model_name = os.getenv("HF_LOCAL_MODEL", "distilgpt2")
452
+ try:
453
+ from transformers import AutoModelForCausalLM, AutoTokenizer
454
+ import torch
455
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
456
+ model = AutoModelForCausalLM.from_pretrained(model_name)
457
+ prompt = f"Context: {context}\n\nQuestion: {question}\nAnswer:"
458
+ inputs = tokenizer(prompt, return_tensors="pt")
459
+ with torch.no_grad():
460
+ out = model.generate(**inputs, max_new_tokens=128, do_sample=True, temperature=0.7)
461
+ text = tokenizer.decode(out[0], skip_special_tokens=True)
462
+ if text.startswith(prompt):
463
+ return text[len(prompt):].strip()
464
+ return text.strip()
465
+ except Exception:
466
+ return "Local model unavailable. Falling back to heuristic answer: " + (
467
+ "Here's what I can tell from your data: total spend is " + _format_number(context["totals"]) + "."
468
+ )
469
+
470
+ # Default fallback
471
+ return "I can't answer that right now. Try the Heuristic engine."
472
+
473
+