eliakuassi commited on
Commit
6aacac5
·
verified ·
1 Parent(s): fffa58a

Upload 8 files

Browse files
DEPLOY.md ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Deployment Guide
2
+
3
+ ## Web UI upload path
4
+
5
+ 1. Go to Hugging Face and create a new **Gradio Space**.
6
+ 2. Download and unzip this package.
7
+ 3. Upload all files from the folder to the Space root.
8
+ 4. Wait for the build to install dependencies from `requirements.txt`.
9
+ 5. Open the Space. On first load, the notebook runs automatically.
10
+ 6. Use **Notebook Runner** if you want to re-run the notebook manually.
11
+
12
+ ## Expected runtime behavior
13
+
14
+ - The app reads `synthetic_sales_data.csv` and `synthetic_book_reviews.csv`.
15
+ - It runs `pythonanalysis.ipynb` with Papermill.
16
+ - Saved figures appear in the gallery.
17
+ - Saved CSV and JSON outputs appear in the table preview and KPI cards.
18
+
19
+ ## Common fixes
20
+
21
+ - If build fails, confirm the Space SDK is **Gradio**.
22
+ - If notebook execution fails, check the `Execution Log` tab output.
23
+ - If the AI tab does not use an LLM, set `HF_API_KEY` in Space variables.
README.md CHANGED
@@ -1,12 +1,51 @@
1
  ---
2
- title: Handson5
3
- emoji: 📊
4
- colorFrom: gray
5
- colorTo: blue
6
  sdk: gradio
7
- sdk_version: 6.10.0
8
  app_file: app.py
9
  pinned: false
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: ESCP Book Analytics Space
3
+ emoji: 📚
4
+ colorFrom: purple
5
+ colorTo: indigo
6
  sdk: gradio
7
+ sdk_version: 5.23.3
8
  app_file: app.py
9
  pinned: false
10
+ python_version: 3.10
11
  ---
12
 
13
+ # ESCP Book Analytics Space
14
+
15
+ This Hugging Face Space runs the included Jupyter notebook automatically and turns the saved outputs into:
16
+
17
+ - KPI cards
18
+ - Interactive sales charts
19
+ - Sentiment charts
20
+ - Top-seller analysis
21
+ - Table previews
22
+ - A simple AI dashboard for natural-language questions
23
+
24
+ ## Files included
25
+
26
+ - `app.py` — the Gradio application
27
+ - `pythonanalysis.ipynb` — your uploaded analysis notebook
28
+ - `synthetic_sales_data.csv` — sales dataset
29
+ - `synthetic_book_reviews.csv` — reviews dataset
30
+ - `style.css` — local styling with no background images
31
+ - `requirements.txt` — pinned Python dependencies
32
+
33
+ ## Environment variables
34
+
35
+ Optional variables for the AI tab:
36
+
37
+ - `HF_API_KEY`
38
+ - `MODEL_NAME`
39
+ - `HF_PROVIDER`
40
+ - `PAPERMILL_TIMEOUT`
41
+ - `AUTO_RUN_ON_LOAD`
42
+
43
+ ## Notes
44
+
45
+ The app creates notebook outputs inside:
46
+
47
+ - `artifacts/py/figures`
48
+ - `artifacts/py/tables`
49
+ - `runs/`
50
+
51
+ The app also sanitizes the notebook before execution so `!pip install ...` cells are skipped. Dependencies are handled by `requirements.txt`.
app.py ADDED
@@ -0,0 +1,736 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # AI-Assisted Code — Academic Integrity Notice
2
+ # Generated with The App Builder. ESCP coursework.
3
+ # Student must be able to explain all code when asked.
4
+
5
+ import json
6
+ import os
7
+ import re
8
+ import time
9
+ import traceback
10
+ from pathlib import Path
11
+ from typing import Any, Dict, List, Tuple
12
+
13
+ import gradio as gr
14
+ import nbformat
15
+ import pandas as pd
16
+ import papermill as pm
17
+ import plotly.graph_objects as go
18
+
19
+ # Optional LLM support via Hugging Face Inference API
20
+ try:
21
+ from huggingface_hub import InferenceClient
22
+ except Exception:
23
+ InferenceClient = None
24
+
25
+
26
+ # =========================================================
27
+ # CONFIG
28
+ # =========================================================
29
+
30
+ BASE_DIR = Path(__file__).resolve().parent
31
+ NOTEBOOK_NAME = os.environ.get("NB2", "pythonanalysis.ipynb").strip()
32
+
33
+ RUNS_DIR = BASE_DIR / "runs"
34
+ ART_DIR = BASE_DIR / "artifacts"
35
+ PY_FIG_DIR = ART_DIR / "py" / "figures"
36
+ PY_TAB_DIR = ART_DIR / "py" / "tables"
37
+ TMP_DIR = BASE_DIR / "tmp"
38
+
39
+ PAPERMILL_TIMEOUT = int(os.environ.get("PAPERMILL_TIMEOUT", "1800"))
40
+ MAX_PREVIEW_ROWS = int(os.environ.get("MAX_FILE_PREVIEW_ROWS", "50"))
41
+ AUTO_RUN_ON_LOAD = os.environ.get("AUTO_RUN_ON_LOAD", "true").lower() == "true"
42
+
43
+ HF_API_KEY = os.environ.get("HF_API_KEY", "").strip()
44
+ MODEL_NAME = os.environ.get("MODEL_NAME", "meta-llama/Llama-3.1-8B-Instruct").strip()
45
+ HF_PROVIDER = os.environ.get("HF_PROVIDER", "hf-inference").strip()
46
+
47
+ LLM_ENABLED = bool(HF_API_KEY) and InferenceClient is not None
48
+ llm_client = (
49
+ InferenceClient(provider=HF_PROVIDER, api_key=HF_API_KEY)
50
+ if LLM_ENABLED
51
+ else None
52
+ )
53
+
54
+
55
+ # =========================================================
56
+ # HELPERS
57
+ # =========================================================
58
+
59
+ def ensure_dirs() -> None:
60
+ """Create all required folders."""
61
+ for folder in [RUNS_DIR, ART_DIR, PY_FIG_DIR, PY_TAB_DIR, TMP_DIR]:
62
+ folder.mkdir(parents=True, exist_ok=True)
63
+
64
+
65
+ def stamp() -> str:
66
+ """Generate a timestamp for run files."""
67
+ return time.strftime("%Y%m%d-%H%M%S")
68
+
69
+
70
+ def _ls(dir_path: Path, exts: Tuple[str, ...]) -> List[str]:
71
+ """List files in a folder filtered by extension."""
72
+ if not dir_path.is_dir():
73
+ return []
74
+ return sorted(
75
+ p.name for p in dir_path.iterdir()
76
+ if p.is_file() and p.suffix.lower() in exts
77
+ )
78
+
79
+
80
+ def _read_csv(path: Path) -> pd.DataFrame:
81
+ """Read a CSV safely for preview."""
82
+ return pd.read_csv(path, nrows=MAX_PREVIEW_ROWS)
83
+
84
+
85
+ def _read_json(path: Path) -> Any:
86
+ """Read JSON safely."""
87
+ with path.open(encoding="utf-8") as file:
88
+ return json.load(file)
89
+
90
+
91
+ def artifacts_index() -> Dict[str, Any]:
92
+ """Return the currently available artifact files."""
93
+ return {
94
+ "python": {
95
+ "figures": _ls(PY_FIG_DIR, (".png", ".jpg", ".jpeg")),
96
+ "tables": _ls(PY_TAB_DIR, (".csv", ".json")),
97
+ }
98
+ }
99
+
100
+
101
+ def has_artifacts() -> bool:
102
+ """Check whether the notebook already produced outputs."""
103
+ idx = artifacts_index()
104
+ return bool(idx["python"]["figures"] or idx["python"]["tables"])
105
+
106
+
107
+ def sanitize_notebook(source_path: Path) -> Path:
108
+ """Create a runtime copy of the notebook without !pip install cells."""
109
+ notebook = nbformat.read(source_path, as_version=4)
110
+ cleaned_cells = []
111
+
112
+ for cell in notebook.cells:
113
+ if cell.cell_type != "code":
114
+ cleaned_cells.append(cell)
115
+ continue
116
+
117
+ source = cell.source.strip()
118
+ if source.startswith("!pip install"):
119
+ continue
120
+
121
+ lines = [
122
+ line for line in cell.source.splitlines()
123
+ if not line.strip().startswith("!pip install")
124
+ ]
125
+ cell.source = "\n".join(lines).strip()
126
+ cleaned_cells.append(cell)
127
+
128
+ notebook.cells = cleaned_cells
129
+ cleaned_path = TMP_DIR / f"cleaned_{source_path.name}"
130
+ nbformat.write(notebook, cleaned_path)
131
+ return cleaned_path
132
+
133
+
134
+ def _load_table_safe(path: Path) -> pd.DataFrame:
135
+ """Load CSV or JSON table for display."""
136
+ try:
137
+ if path.suffix.lower() == ".json":
138
+ obj = _read_json(path)
139
+ if isinstance(obj, dict):
140
+ return pd.DataFrame([obj])
141
+ return pd.DataFrame(obj)
142
+ return _read_csv(path)
143
+ except Exception as exc:
144
+ return pd.DataFrame([{"error": str(exc)}])
145
+
146
+
147
+ # =========================================================
148
+ # NOTEBOOK RUNNER
149
+ # =========================================================
150
+
151
+ def run_notebook() -> str:
152
+ """Execute the uploaded notebook with Papermill."""
153
+ ensure_dirs()
154
+ notebook_path = BASE_DIR / NOTEBOOK_NAME
155
+
156
+ if not notebook_path.exists():
157
+ return f"ERROR: {NOTEBOOK_NAME} not found in the Space root folder."
158
+
159
+ cleaned_nb = sanitize_notebook(notebook_path)
160
+ output_nb = RUNS_DIR / f"run_{stamp()}_{NOTEBOOK_NAME}"
161
+
162
+ pm.execute_notebook(
163
+ input_path=str(cleaned_nb),
164
+ output_path=str(output_nb),
165
+ cwd=str(BASE_DIR),
166
+ log_output=True,
167
+ progress_bar=False,
168
+ request_save_on_cell_execute=True,
169
+ execution_timeout=PAPERMILL_TIMEOUT,
170
+ )
171
+ return f"Executed notebook: {NOTEBOOK_NAME}"
172
+
173
+
174
+ def run_pipeline() -> str:
175
+ """Run the full notebook and summarize generated outputs."""
176
+ try:
177
+ log = run_notebook()
178
+ idx = artifacts_index()
179
+ figures = idx["python"]["figures"]
180
+ tables = idx["python"]["tables"]
181
+
182
+ lines = [
183
+ "✅ Notebook execution completed.",
184
+ log,
185
+ "",
186
+ f"Figures: {', '.join(figures) or '(none)'}",
187
+ f"Tables: {', '.join(tables) or '(none)'}",
188
+ ]
189
+ return "\n".join(lines)
190
+ except Exception as exc:
191
+ return f"❌ Notebook execution failed: {exc}\n\n{traceback.format_exc()[-3000:]}"
192
+
193
+
194
+ def maybe_autorun() -> Tuple[str, str, go.Figure, go.Figure, go.Figure, List[Tuple[str, str]], gr.Dropdown, pd.DataFrame]:
195
+ """Auto-run once when the app loads if no artifacts exist yet."""
196
+ ensure_dirs()
197
+ if AUTO_RUN_ON_LOAD and not has_artifacts():
198
+ log = run_pipeline()
199
+ else:
200
+ log = "Ready. Existing artifacts found, so auto-run was skipped."
201
+
202
+ kpi_html = render_kpi_cards()
203
+ sales_chart = build_sales_chart()
204
+ sentiment_chart = build_sentiment_chart()
205
+ top_chart = build_top_sellers_chart()
206
+ figures, dropdown_update, default_df = refresh_gallery()
207
+ return log, kpi_html, sales_chart, sentiment_chart, top_chart, figures, dropdown_update, default_df
208
+
209
+
210
+ # =========================================================
211
+ # GALLERY + KPI LOADERS
212
+ # =========================================================
213
+
214
+ def _load_all_figures() -> List[Tuple[str, str]]:
215
+ """Return all saved figures for the Gradio gallery."""
216
+ items = []
217
+ for path in sorted(PY_FIG_DIR.glob("*.png")):
218
+ items.append((str(path), path.stem.replace("_", " ").title()))
219
+ for path in sorted(PY_FIG_DIR.glob("*.jpg")):
220
+ items.append((str(path), path.stem.replace("_", " ").title()))
221
+ return items
222
+
223
+
224
+ def refresh_gallery():
225
+ """Refresh gallery content and the table selector."""
226
+ figures = _load_all_figures()
227
+ idx = artifacts_index()
228
+ table_choices = list(idx["python"]["tables"])
229
+
230
+ default_df = pd.DataFrame([{"hint": "Run the notebook to generate tables."}])
231
+ if table_choices:
232
+ default_df = _load_table_safe(PY_TAB_DIR / table_choices[0])
233
+
234
+ return (
235
+ figures,
236
+ gr.update(
237
+ choices=table_choices,
238
+ value=table_choices[0] if table_choices else None,
239
+ ),
240
+ default_df,
241
+ )
242
+
243
+
244
+ def on_table_select(choice: str) -> pd.DataFrame:
245
+ """Load a selected table into the preview grid."""
246
+ if not choice:
247
+ return pd.DataFrame([{"hint": "Select a table above."}])
248
+
249
+ path = PY_TAB_DIR / choice
250
+ if not path.exists():
251
+ return pd.DataFrame([{"error": f"File not found: {choice}"}])
252
+
253
+ return _load_table_safe(path)
254
+
255
+
256
+ def load_kpis() -> Dict[str, Any]:
257
+ """Load KPI JSON if the notebook generated one."""
258
+ candidates = [
259
+ PY_TAB_DIR / "kpis.json",
260
+ PY_FIG_DIR / "kpis.json",
261
+ ]
262
+ for candidate in candidates:
263
+ if candidate.exists():
264
+ try:
265
+ return _read_json(candidate)
266
+ except Exception:
267
+ pass
268
+ return {}
269
+
270
+
271
+ # =========================================================
272
+ # AI DASHBOARD
273
+ # =========================================================
274
+
275
+ DASHBOARD_SYSTEM = """You are an AI dashboard assistant for a book analytics app.
276
+ The user asks questions about notebook outputs. You have access to precomputed artifacts.
277
+
278
+ AVAILABLE ARTIFACTS:
279
+ {artifacts_json}
280
+
281
+ KPI SUMMARY:
282
+ {kpis_json}
283
+
284
+ At the end of every answer, output a fenced JSON block:
285
+ {{"show": "figure"|"table"|"none", "scope": "python", "filename": "..."}}
286
+
287
+ Rules:
288
+ - Use figure for sales trends, forecast plots, or sentiment plots.
289
+ - Use table for top sellers, pricing decisions, or dashboard tables.
290
+ - Use none if nothing relevant exists.
291
+ - Keep the natural-language answer to 2-4 sentences.
292
+ """
293
+
294
+ JSON_BLOCK_RE = re.compile(r"```json\s*(\{.*?\})\s*```", re.DOTALL)
295
+ FALLBACK_JSON_RE = re.compile(r"\{[^{}]*\"show\"[^{}]*\}", re.DOTALL)
296
+
297
+
298
+ def _parse_display_directive(text: str) -> Dict[str, str]:
299
+ """Extract the JSON display instruction from the model response."""
300
+ match = JSON_BLOCK_RE.search(text)
301
+ if match:
302
+ try:
303
+ return json.loads(match.group(1))
304
+ except json.JSONDecodeError:
305
+ pass
306
+
307
+ match = FALLBACK_JSON_RE.search(text)
308
+ if match:
309
+ try:
310
+ return json.loads(match.group(0))
311
+ except json.JSONDecodeError:
312
+ pass
313
+
314
+ return {"show": "none"}
315
+
316
+
317
+ def _clean_response(text: str) -> str:
318
+ """Remove the JSON display block from the shown answer."""
319
+ return JSON_BLOCK_RE.sub("", text).strip()
320
+
321
+
322
+ def _keyword_fallback(msg: str, idx: Dict[str, Any], kpis: Dict[str, Any]) -> Tuple[str, Dict[str, str]]:
323
+ """Fallback routing when no LLM key is configured."""
324
+ lowered = msg.lower()
325
+
326
+ if not idx["python"]["figures"] and not idx["python"]["tables"]:
327
+ return (
328
+ "No notebook artifacts exist yet. Run the notebook first, then ask again.",
329
+ {"show": "none"},
330
+ )
331
+
332
+ summary = ""
333
+ if kpis:
334
+ total_units = kpis.get("total_units_sold", 0)
335
+ summary = (
336
+ f"Quick summary: {kpis.get('n_titles', '?')} titles across "
337
+ f"{kpis.get('n_months', '?')} months with {total_units:,.0f} units sold."
338
+ )
339
+
340
+ if any(word in lowered for word in ["trend", "sales", "forecast", "arima", "predict"]):
341
+ return (
342
+ f"Here is the sales view. {summary}",
343
+ {"show": "figure", "scope": "python", "filename": "df_dashboard.csv"},
344
+ )
345
+
346
+ if any(word in lowered for word in ["sentiment", "positive", "negative", "review"]):
347
+ return (
348
+ f"Here is the sentiment view. {summary}",
349
+ {"show": "figure", "scope": "python", "filename": "sentiment_counts_sampled.csv"},
350
+ )
351
+
352
+ if any(word in lowered for word in ["top", "best", "popular", "rank"]):
353
+ return (
354
+ f"Here are the top-selling titles. {summary}",
355
+ {"show": "table", "scope": "python", "filename": "top_titles_by_units_sold.csv"},
356
+ )
357
+
358
+ if any(word in lowered for word in ["price", "pricing", "decision"]):
359
+ return (
360
+ f"Here are the pricing decisions. {summary}",
361
+ {"show": "table", "scope": "python", "filename": "pricing_decisions.csv"},
362
+ )
363
+
364
+ if any(word in lowered for word in ["overview", "dashboard", "summary", "kpi"]):
365
+ return (
366
+ f"Here is the notebook dashboard overview. {summary}",
367
+ {"show": "table", "scope": "python", "filename": "df_dashboard.csv"},
368
+ )
369
+
370
+ return (
371
+ "I can answer questions about sales trends, sentiment, forecasts, pricing decisions, and top sellers.",
372
+ {"show": "none"},
373
+ )
374
+
375
+
376
+ def ai_chat(user_msg: str, history: List[Dict[str, str]]):
377
+ """Drive the AI dashboard using either an LLM or keyword fallback."""
378
+ if not user_msg or not user_msg.strip():
379
+ return history, "", None, None
380
+
381
+ idx = artifacts_index()
382
+ kpis = load_kpis()
383
+
384
+ if not LLM_ENABLED:
385
+ reply, directive = _keyword_fallback(user_msg, idx, kpis)
386
+ else:
387
+ system_prompt = DASHBOARD_SYSTEM.format(
388
+ artifacts_json=json.dumps(idx, indent=2),
389
+ kpis_json=json.dumps(kpis, indent=2) if kpis else "(no KPIs yet)",
390
+ )
391
+ messages = [{"role": "system", "content": system_prompt}]
392
+ messages.extend((history or [])[-6:])
393
+ messages.append({"role": "user", "content": user_msg})
394
+
395
+ try:
396
+ response = llm_client.chat_completion(
397
+ model=MODEL_NAME,
398
+ messages=messages,
399
+ temperature=0.3,
400
+ max_tokens=500,
401
+ stream=False,
402
+ )
403
+ raw = (
404
+ response["choices"][0]["message"]["content"]
405
+ if isinstance(response, dict)
406
+ else response.choices[0].message.content
407
+ )
408
+ directive = _parse_display_directive(raw)
409
+ reply = _clean_response(raw)
410
+ except Exception as exc:
411
+ reply, directive = _keyword_fallback(user_msg, idx, kpis)
412
+ reply = f"LLM error: {exc}\n\n{reply}"
413
+
414
+ chart_out = None
415
+ table_out = None
416
+ filename = directive.get("filename", "")
417
+ show = directive.get("show", "none")
418
+
419
+ if show == "figure":
420
+ if "sentiment" in filename:
421
+ chart_out = build_sentiment_chart()
422
+ elif "top_titles" in filename:
423
+ chart_out = build_top_sellers_chart()
424
+ else:
425
+ chart_out = build_sales_chart()
426
+
427
+ if show == "table" and filename:
428
+ file_path = PY_TAB_DIR / filename
429
+ if file_path.exists():
430
+ table_out = _load_table_safe(file_path)
431
+ else:
432
+ table_out = pd.DataFrame([{"error": f"Missing table: {filename}"}])
433
+
434
+ new_history = (history or []) + [
435
+ {"role": "user", "content": user_msg},
436
+ {"role": "assistant", "content": reply},
437
+ ]
438
+ return new_history, "", chart_out, table_out
439
+
440
+
441
+ # =========================================================
442
+ # KPI CARDS
443
+ # =========================================================
444
+
445
+ def render_kpi_cards() -> str:
446
+ """Render KPI cards as HTML."""
447
+ kpis = load_kpis()
448
+ if not kpis:
449
+ return """
450
+ <div class="card-grid">
451
+ <div class="kpi-card">
452
+ <div class="kpi-icon">📊</div>
453
+ <div class="kpi-label">No data yet</div>
454
+ <div class="kpi-value">Run the notebook first</div>
455
+ </div>
456
+ </div>
457
+ """
458
+
459
+ def format_value(value: Any) -> str:
460
+ if isinstance(value, (int, float)) and value > 100:
461
+ return f"{value:,.0f}"
462
+ return str(value)
463
+
464
+ html = ['<div class="card-grid">']
465
+ for key, value in kpis.items():
466
+ label = key.replace("_", " ").title()
467
+ html.append(
468
+ f"""
469
+ <div class="kpi-card">
470
+ <div class="kpi-icon">📈</div>
471
+ <div class="kpi-label">{label}</div>
472
+ <div class="kpi-value">{format_value(value)}</div>
473
+ </div>
474
+ """
475
+ )
476
+ html.append("</div>")
477
+ return "".join(html)
478
+
479
+
480
+ # =========================================================
481
+ # CHART BUILDERS
482
+ # =========================================================
483
+
484
+ CHART_PALETTE = [
485
+ "#7c5cbf", "#2ec4a0", "#e8537a", "#e8a230", "#5e8fef",
486
+ "#c45ea8", "#3dbacc", "#a0522d", "#6aaa3a", "#d46060",
487
+ ]
488
+
489
+
490
+ def _styled_layout(**kwargs) -> Dict[str, Any]:
491
+ """Apply a consistent Plotly style."""
492
+ defaults = dict(
493
+ template="plotly_white",
494
+ paper_bgcolor="rgba(255,255,255,0.98)",
495
+ plot_bgcolor="rgba(255,255,255,0.98)",
496
+ font=dict(family="system-ui, sans-serif", color="#2d1f4e", size=12),
497
+ margin=dict(l=60, r=20, t=70, b=60),
498
+ legend=dict(
499
+ orientation="h",
500
+ yanchor="bottom",
501
+ y=1.02,
502
+ xanchor="right",
503
+ x=1,
504
+ ),
505
+ title=dict(font=dict(size=16, color="#4b2d8a")),
506
+ )
507
+ defaults.update(kwargs)
508
+ return defaults
509
+
510
+
511
+ def _empty_chart(title: str) -> go.Figure:
512
+ """Return a placeholder chart."""
513
+ fig = go.Figure()
514
+ fig.update_layout(
515
+ title=title,
516
+ height=420,
517
+ template="plotly_white",
518
+ annotations=[
519
+ dict(
520
+ text="Run the notebook to generate this chart",
521
+ x=0.5,
522
+ y=0.5,
523
+ xref="paper",
524
+ yref="paper",
525
+ showarrow=False,
526
+ font=dict(size=14),
527
+ )
528
+ ],
529
+ )
530
+ return fig
531
+
532
+
533
+ def build_sales_chart() -> go.Figure:
534
+ """Build the monthly sales chart from df_dashboard.csv."""
535
+ path = PY_TAB_DIR / "df_dashboard.csv"
536
+ if not path.exists():
537
+ return _empty_chart("Monthly Overview")
538
+
539
+ df = pd.read_csv(path)
540
+ date_col = next((c for c in df.columns if "month" in c.lower() or "date" in c.lower()), None)
541
+ value_cols = [
542
+ c for c in df.columns
543
+ if c != date_col and pd.api.types.is_numeric_dtype(df[c])
544
+ ]
545
+
546
+ if not date_col or not value_cols:
547
+ return _empty_chart("Monthly Overview")
548
+
549
+ df[date_col] = pd.to_datetime(df[date_col], errors="coerce")
550
+ fig = go.Figure()
551
+
552
+ for idx, col in enumerate(value_cols):
553
+ fig.add_trace(
554
+ go.Scatter(
555
+ x=df[date_col],
556
+ y=df[col],
557
+ mode="lines+markers",
558
+ name=col.replace("_", " ").title(),
559
+ line=dict(color=CHART_PALETTE[idx % len(CHART_PALETTE)], width=2),
560
+ marker=dict(size=5),
561
+ )
562
+ )
563
+
564
+ fig.update_layout(**_styled_layout(height=450, hovermode="x unified", title=dict(text="Monthly Overview")))
565
+ return fig
566
+
567
+
568
+ def build_sentiment_chart() -> go.Figure:
569
+ """Build the sentiment chart from sentiment_counts_sampled.csv."""
570
+ path = PY_TAB_DIR / "sentiment_counts_sampled.csv"
571
+ if not path.exists():
572
+ return _empty_chart("Sentiment Distribution")
573
+
574
+ df = pd.read_csv(path)
575
+ title_col = df.columns[0]
576
+ sentiment_cols = [c for c in ["negative", "neutral", "positive"] if c in df.columns]
577
+ if not sentiment_cols:
578
+ return _empty_chart("Sentiment Distribution")
579
+
580
+ fig = go.Figure()
581
+ colors = {"negative": "#e8537a", "neutral": "#5e8fef", "positive": "#2ec4a0"}
582
+
583
+ for col in sentiment_cols:
584
+ fig.add_trace(
585
+ go.Bar(
586
+ y=df[title_col],
587
+ x=df[col],
588
+ orientation="h",
589
+ name=col.title(),
590
+ marker_color=colors[col],
591
+ )
592
+ )
593
+
594
+ fig.update_layout(
595
+ **_styled_layout(
596
+ height=max(400, len(df) * 28),
597
+ barmode="stack",
598
+ title=dict(text="Sentiment Distribution by Book"),
599
+ )
600
+ )
601
+ fig.update_yaxes(autorange="reversed")
602
+ return fig
603
+
604
+
605
+ def build_top_sellers_chart() -> go.Figure:
606
+ """Build the top sellers chart from top_titles_by_units_sold.csv."""
607
+ path = PY_TAB_DIR / "top_titles_by_units_sold.csv"
608
+ if not path.exists():
609
+ return _empty_chart("Top Sellers")
610
+
611
+ df = pd.read_csv(path).head(15)
612
+ title_col = next((c for c in df.columns if "title" in c.lower()), df.columns[0])
613
+ value_col = next((c for c in df.columns if "unit" in c.lower() or "sold" in c.lower()), df.columns[-1])
614
+
615
+ fig = go.Figure(
616
+ go.Bar(
617
+ y=df[title_col],
618
+ x=df[value_col],
619
+ orientation="h",
620
+ marker=dict(color=df[value_col], colorscale=[[0, "#c5b4f0"], [1, "#7c5cbf"]]),
621
+ )
622
+ )
623
+ fig.update_layout(**_styled_layout(height=max(400, len(df) * 30), showlegend=False, title=dict(text="Top Selling Titles")))
624
+ fig.update_yaxes(autorange="reversed")
625
+ return fig
626
+
627
+
628
+ def refresh_dashboard():
629
+ """Refresh all dashboard widgets."""
630
+ return render_kpi_cards(), build_sales_chart(), build_sentiment_chart(), build_top_sellers_chart()
631
+
632
+
633
+ # =========================================================
634
+ # UI
635
+ # =========================================================
636
+
637
+ def load_css() -> str:
638
+ """Read the local CSS file if present."""
639
+ css_path = BASE_DIR / "style.css"
640
+ return css_path.read_text(encoding="utf-8") if css_path.exists() else ""
641
+
642
+
643
+ ensure_dirs()
644
+
645
+ with gr.Blocks(title="ESCP Book Analytics Space", css=load_css(), theme=gr.themes.Soft()) as demo:
646
+ gr.Markdown(
647
+ """
648
+ # ESCP Book Analytics Space
649
+ This Space automatically runs your uploaded notebook and turns the saved outputs
650
+ into an interactive dashboard for sales, sentiment, top sellers, and pricing decisions.
651
+ """,
652
+ elem_id="app_title",
653
+ )
654
+
655
+ with gr.Tab("Notebook Runner"):
656
+ gr.Markdown("Run the notebook manually whenever you want to refresh the outputs.")
657
+ run_btn = gr.Button("Run Notebook", variant="primary")
658
+ run_log = gr.Textbox(label="Execution Log", lines=18, max_lines=30, interactive=False)
659
+ run_btn.click(run_pipeline, outputs=run_log)
660
+
661
+ with gr.Tab("Dashboard"):
662
+ kpi_html = gr.HTML(value=render_kpi_cards())
663
+ refresh_btn = gr.Button("Refresh Dashboard", variant="primary")
664
+
665
+ gr.Markdown("### Interactive Charts")
666
+ chart_sales = gr.Plot(label="Monthly Overview")
667
+ chart_sentiment = gr.Plot(label="Sentiment Distribution")
668
+ chart_top = gr.Plot(label="Top Sellers")
669
+
670
+ gr.Markdown("### Static Figures")
671
+ gallery = gr.Gallery(label="Generated Notebook Figures", columns=2, height=420, object_fit="contain")
672
+
673
+ gr.Markdown("### Data Tables")
674
+ table_dropdown = gr.Dropdown(label="Select a table", choices=[], interactive=True)
675
+ table_display = gr.Dataframe(label="Table Preview", interactive=False)
676
+
677
+ def _on_refresh():
678
+ kpi, c1, c2, c3 = refresh_dashboard()
679
+ figs, dd, df = refresh_gallery()
680
+ return kpi, c1, c2, c3, figs, dd, df
681
+
682
+ refresh_btn.click(
683
+ _on_refresh,
684
+ outputs=[kpi_html, chart_sales, chart_sentiment, chart_top, gallery, table_dropdown, table_display],
685
+ )
686
+ table_dropdown.change(on_table_select, inputs=table_dropdown, outputs=table_display)
687
+
688
+ with gr.Tab("AI Dashboard"):
689
+ status_text = (
690
+ "LLM mode is active." if LLM_ENABLED
691
+ else "Keyword matching mode is active. Set HF_API_KEY later if you want natural-language routing."
692
+ )
693
+ gr.Markdown(
694
+ f"""
695
+ ### Ask questions about your notebook outputs
696
+ {status_text}
697
+ """
698
+ )
699
+
700
+ with gr.Row(equal_height=True):
701
+ with gr.Column(scale=1):
702
+ chatbot = gr.Chatbot(label="Conversation", height=380, type="messages")
703
+ user_input = gr.Textbox(
704
+ label="Ask about your data",
705
+ placeholder="Show me the sales trends / Which titles sell the most? / What is the sentiment distribution?",
706
+ lines=1,
707
+ )
708
+ gr.Examples(
709
+ examples=[
710
+ "Show me the sales trends",
711
+ "What does the sentiment look like?",
712
+ "Which titles sell the most?",
713
+ "Show the pricing decisions",
714
+ "Give me a dashboard overview",
715
+ ],
716
+ inputs=user_input,
717
+ )
718
+
719
+ with gr.Column(scale=1):
720
+ ai_figure = gr.Plot(label="Interactive Chart")
721
+ ai_table = gr.Dataframe(label="Data Table", interactive=False)
722
+
723
+ user_input.submit(
724
+ ai_chat,
725
+ inputs=[user_input, chatbot],
726
+ outputs=[chatbot, user_input, ai_figure, ai_table],
727
+ )
728
+
729
+ demo.load(
730
+ maybe_autorun,
731
+ outputs=[run_log, kpi_html, chart_sales, chart_sentiment, chart_top, gallery, table_dropdown, table_display],
732
+ )
733
+
734
+
735
+ if __name__ == "__main__":
736
+ demo.launch()
pythonanalysis.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
requirements.txt ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ gradio==5.23.3
2
+ pandas==2.2.3
3
+ papermill==2.6.0
4
+ plotly==6.0.1
5
+ nbformat==5.10.4
6
+ matplotlib==3.10.1
7
+ seaborn==0.13.2
8
+ numpy==2.2.4
9
+ vaderSentiment==3.3.2
10
+ statsmodels==0.14.4
11
+ textblob==0.19.0
12
+ transformers==4.49.0
13
+ huggingface_hub==0.29.3
14
+ requests==2.32.3
15
+ faker==37.1.0
16
+ jupyter==1.1.1
17
+ ipykernel==6.29.5
style.css ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ :root {
3
+ --bg: linear-gradient(135deg, #f7f3ff 0%, #eef4ff 100%);
4
+ --card: rgba(255, 255, 255, 0.86);
5
+ --border: rgba(124, 92, 191, 0.18);
6
+ --text: #2d1f4e;
7
+ --muted: #6f60a8;
8
+ --accent: #7c5cbf;
9
+ }
10
+
11
+ body, .gradio-container {
12
+ background: var(--bg) !important;
13
+ color: var(--text) !important;
14
+ font-family: Inter, system-ui, sans-serif !important;
15
+ }
16
+
17
+ #app_title {
18
+ background: var(--card);
19
+ border: 1px solid var(--border);
20
+ border-radius: 22px;
21
+ padding: 10px 18px;
22
+ box-shadow: 0 10px 30px rgba(124, 92, 191, 0.08);
23
+ }
24
+
25
+ .card-grid {
26
+ display: grid;
27
+ grid-template-columns: repeat(auto-fit, minmax(150px, 1fr));
28
+ gap: 14px;
29
+ margin: 10px 0 22px 0;
30
+ }
31
+
32
+ .kpi-card {
33
+ background: var(--card);
34
+ border: 1px solid var(--border);
35
+ border-radius: 20px;
36
+ padding: 18px 14px;
37
+ text-align: center;
38
+ box-shadow: 0 8px 20px rgba(124, 92, 191, 0.08);
39
+ backdrop-filter: blur(12px);
40
+ }
41
+
42
+ .kpi-icon {
43
+ font-size: 26px;
44
+ margin-bottom: 8px;
45
+ }
46
+
47
+ .kpi-label {
48
+ color: var(--muted);
49
+ font-size: 11px;
50
+ text-transform: uppercase;
51
+ letter-spacing: 1.2px;
52
+ font-weight: 700;
53
+ margin-bottom: 6px;
54
+ }
55
+
56
+ .kpi-value {
57
+ color: var(--text);
58
+ font-size: 18px;
59
+ font-weight: 800;
60
+ }
61
+
62
+ button.primary, button.lg.primary {
63
+ background: linear-gradient(135deg, #7c5cbf 0%, #5e8fef 100%) !important;
64
+ border: none !important;
65
+ }
66
+
67
+ .gradio-container .tabitem {
68
+ background: rgba(255, 255, 255, 0.72);
69
+ border-radius: 16px;
70
+ }
synthetic_book_reviews.csv ADDED
The diff for this file is too large to render. See raw diff
 
synthetic_sales_data.csv ADDED
The diff for this file is too large to render. See raw diff