johnnytay commited on
Commit
c7449d2
·
verified ·
1 Parent(s): 9e39f44

Upload folder using huggingface_hub

Browse files
Files changed (4) hide show
  1. knowledge_transfer_week1_part3.ipynb +183 -178
  2. pyproject.toml +1 -1
  3. requirements.txt +1 -1
  4. uv.lock +74 -4
knowledge_transfer_week1_part3.ipynb CHANGED
@@ -2,6 +2,7 @@
2
  "cells": [
3
  {
4
  "cell_type": "markdown",
 
5
  "metadata": {},
6
  "source": [
7
  "# KT1 Part 3 — Streaming, Filesystem MCP, and Gradio\n",
@@ -17,8 +18,7 @@
17
  "| **Gradio** | Wrap the streaming agent in a chat UI you can share with a URL |\n",
18
  "\n",
19
  "By the end you'll have a KB-grounded customer support assistant running in a local Gradio app, streaming its reasoning and tool calls token-by-token."
20
- ],
21
- "id": "9c686d08"
22
  },
23
  {
24
  "cell_type": "markdown",
@@ -33,7 +33,10 @@
33
  },
34
  {
35
  "cell_type": "code",
 
 
36
  "metadata": {},
 
37
  "source": [
38
  "import os\n",
39
  "import warnings\n",
@@ -51,13 +54,11 @@
51
  " azure_deployment=os.getenv(\"AZURE_OPENAI_CHAT_DEPLOYMENT_NAME\"),\n",
52
  " api_version=os.getenv(\"AZURE_OPENAI_API_VERSION\"),\n",
53
  ")"
54
- ],
55
- "execution_count": null,
56
- "outputs": [],
57
- "id": "f92d6179"
58
  },
59
  {
60
  "cell_type": "markdown",
 
61
  "metadata": {},
62
  "source": [
63
  "---\n",
@@ -78,11 +79,11 @@
78
  "| `custom` | Whatever you emit via `get_stream_writer()` | You want to surface app-specific progress events |\n",
79
  "\n",
80
  "You can combine modes: `stream_mode=[\"updates\", \"messages\"]` yields `(mode_name, event)` tuples and you route by mode."
81
- ],
82
- "id": "8991899e"
83
  },
84
  {
85
  "cell_type": "markdown",
 
86
  "metadata": {},
87
  "source": [
88
  "### A workflow to stream\n",
@@ -90,12 +91,14 @@
90
  "To keep the focus on streaming, we'll reuse the **support-ticket triage** workflow from Part 2 Pattern 1. The demo ticket below routes down the auto-respond path — three LLM calls (classify → generate → polish) gives us plenty of events to observe.\n",
91
  "\n",
92
  "If you already ran Part 2, this is the same shape — skim it."
93
- ],
94
- "id": "4918396a"
95
  },
96
  {
97
  "cell_type": "code",
 
 
98
  "metadata": {},
 
99
  "source": [
100
  "from typing_extensions import TypedDict\n",
101
  "from typing import Annotated, Literal\n",
@@ -227,24 +230,24 @@
227
  "\n",
228
  "support_chain = workflow.compile()\n",
229
  "display(Image(support_chain.get_graph().draw_mermaid_png()))"
230
- ],
231
- "execution_count": null,
232
- "outputs": [],
233
- "id": "795c4f3d"
234
  },
235
  {
236
  "cell_type": "markdown",
 
237
  "metadata": {},
238
  "source": [
239
  "### Mode 1 — `updates`: node-level progress\n",
240
  "\n",
241
  "Yields one event per node, containing only what that node **changed**. The simplest mode — a clean \"step by step\" feed."
242
- ],
243
- "id": "bb33e148"
244
  },
245
  {
246
  "cell_type": "code",
 
 
247
  "metadata": {},
 
248
  "source": [
249
  "ticket = {\n",
250
  " \"customer_message\": \"My subscription was double-charged this month — please refund the extra charge.\",\n",
@@ -256,36 +259,34 @@
256
  " for node, delta in event.items():\n",
257
  " keys = list(delta.keys())\n",
258
  " print(f\" {node:10s} -> updated {keys}\")"
259
- ],
260
- "execution_count": null,
261
- "outputs": [],
262
- "id": "d2f79113"
263
  },
264
  {
265
  "cell_type": "markdown",
 
266
  "metadata": {},
267
  "source": [
268
  "### Mode 2 — `values`: cumulative state snapshots\n",
269
  "\n",
270
  "One event per checkpoint, each containing the **full state**. You get one snapshot for the initial input plus one after every node runs — so on our 3-node happy path, that's 4 snapshots. Useful when you want to render the latest known values to a UI without tracking deltas yourself."
271
- ],
272
- "id": "d1a0556e"
273
  },
274
  {
275
  "cell_type": "code",
 
 
276
  "metadata": {},
 
277
  "source": [
278
  "print(\"Streaming 'values' mode:\\n\")\n",
279
  "for i, snapshot in enumerate(support_chain.stream(ticket, stream_mode=\"values\")):\n",
280
  " filled = [k for k, v in snapshot.items() if v not in (None, \"\", [], {})]\n",
281
  " print(f\" step {i}: filled fields = {filled}\")"
282
- ],
283
- "execution_count": null,
284
- "outputs": [],
285
- "id": "bf7b3c36"
286
  },
287
  {
288
  "cell_type": "markdown",
 
289
  "metadata": {},
290
  "source": [
291
  "### Mode 3 — `messages`: token-by-token streaming\n",
@@ -293,12 +294,14 @@
293
  "This is the one you want for ChatGPT-style UX. Each event is a `(chunk, metadata)` tuple where `chunk` is an `AIMessageChunk` holding a few tokens. `metadata` tells you which **node** emitted the chunk — so you can filter to just the node whose output you want to render.\n",
294
  "\n",
295
  "Use the **async** variant (`astream`) — token streaming is async-native on the SDK side."
296
- ],
297
- "id": "8bf9bad5"
298
  },
299
  {
300
  "cell_type": "code",
 
 
301
  "metadata": {},
 
302
  "source": [
303
  "print(\"Streaming 'messages' mode — tokens from the 'polish_response' node only:\\n\")\n",
304
  "\n",
@@ -306,24 +309,24 @@
306
  " if metadata.get(\"langgraph_node\") == \"polish_response\" and chunk.content:\n",
307
  " print(chunk.content, end=\"\", flush=True)\n",
308
  "print()"
309
- ],
310
- "execution_count": null,
311
- "outputs": [],
312
- "id": "eb480690"
313
  },
314
  {
315
  "cell_type": "markdown",
 
316
  "metadata": {},
317
  "source": [
318
  "### Mode 4 — `custom`: emit your own events\n",
319
  "\n",
320
  "Sometimes a node does meaningful work **between** LLM calls — a DB query, a long loop, a parse — and you want the UI to know. Call `get_stream_writer()` inside the node to push a custom event; the caller receives it in `stream_mode=\"custom\"`."
321
- ],
322
- "id": "75bd4132"
323
  },
324
  {
325
  "cell_type": "code",
 
 
326
  "metadata": {},
 
327
  "source": [
328
  "from langgraph.config import get_stream_writer\n",
329
  "\n",
@@ -354,13 +357,11 @@
354
  "print(\"Streaming 'custom' events:\\n\")\n",
355
  "for event in demo_chain.stream(ticket, stream_mode=\"custom\"):\n",
356
  " print(f\" {event}\")"
357
- ],
358
- "execution_count": null,
359
- "outputs": [],
360
- "id": "9c14989f"
361
  },
362
  {
363
  "cell_type": "markdown",
 
364
  "metadata": {},
365
  "source": [
366
  "### What to notice\n",
@@ -369,11 +370,11 @@
369
  "- **Modes are cheap to combine.** `stream_mode=[\"updates\", \"messages\"]` returns `(mode, event)` tuples — drive multiple UI elements from one subscription.\n",
370
  "- **`astream` pairs with `messages`.** Token streaming is async on the SDK side; use the async variant so tokens flow promptly.\n",
371
  "- **`get_stream_writer()` is the escape hatch.** Anything that isn't an LLM token or a node boundary — progress text, tool latency, partial parses — belongs in `custom`."
372
- ],
373
- "id": "c29456c8"
374
  },
375
  {
376
  "cell_type": "markdown",
 
377
  "metadata": {},
378
  "source": [
379
  "---\n",
@@ -402,11 +403,11 @@
402
  "| `list_allowed_directories` | Show which paths this server is permitted to access |\n",
403
  "\n",
404
  "We'll only wire `read_text_file` to the agent — the others aren't needed once the file inventory is baked into the prompt. Like Playwright, the server ships as a Node process you run with `npx` — no Python package to install."
405
- ],
406
- "id": "04f00e37"
407
  },
408
  {
409
  "cell_type": "markdown",
 
410
  "metadata": {},
411
  "source": [
412
  "### One-time install\n",
@@ -419,22 +420,24 @@
419
  "```\n",
420
  "\n",
421
  "That's it. The first time `MultiServerMCPClient` connects, `npx` fetches and caches the server package — no separate install step."
422
- ],
423
- "id": "eb4331ea"
424
  },
425
  {
426
  "cell_type": "markdown",
 
427
  "metadata": {},
428
  "source": [
429
  "### Build a mini knowledge base\n",
430
  "\n",
431
  "Three small markdown files covering the ticket types from Section 1: billing, password/access, shipping. In a real deployment this would be hundreds of articles versioned in git — the MCP boundary stays identical."
432
- ],
433
- "id": "e6d91476"
434
  },
435
  {
436
  "cell_type": "code",
 
 
437
  "metadata": {},
 
438
  "source": [
439
  "from pathlib import Path\n",
440
  "\n",
@@ -497,24 +500,24 @@
497
  "print(\"KB files in\", kb_dir.resolve(), \":\")\n",
498
  "for f in sorted(kb_dir.iterdir()):\n",
499
  " print(f\" - {f.name} ({f.stat().st_size} bytes)\")"
500
- ],
501
- "execution_count": null,
502
- "outputs": [],
503
- "id": "21f7349f"
504
  },
505
  {
506
  "cell_type": "markdown",
 
507
  "metadata": {},
508
  "source": [
509
  "### The notebook compatibility shim (same as Part 2)\n",
510
  "\n",
511
  "MCP's stdio transport expects real file descriptors, which Jupyter's captured streams don't always provide. This shim gets us past that — only needed inside notebooks."
512
- ],
513
- "id": "20f8f26b"
514
  },
515
  {
516
  "cell_type": "code",
 
 
517
  "metadata": {},
 
518
  "source": [
519
  "import sys, io\n",
520
  "\n",
@@ -528,13 +531,11 @@
528
  " stream.fileno()\n",
529
  " except (AttributeError, io.UnsupportedOperation):\n",
530
  " stream.fileno = _safe_fileno"
531
- ],
532
- "execution_count": null,
533
- "outputs": [],
534
- "id": "7f9070ed"
535
  },
536
  {
537
  "cell_type": "markdown",
 
538
  "metadata": {},
539
  "source": [
540
  "### Connect and list the tools\n",
@@ -542,12 +543,14 @@
542
  "We pass the absolute path to `support_kb/` as the one allowed directory, and set the server's `cwd` to the same path so bare basenames (e.g. `\"billing.md\"`) resolve correctly inside the sandbox. The server **refuses any read outside that path** — a simple but real sandbox; if the agent ever tried to read `/etc/passwd`, the call would error at the MCP boundary rather than in our Python code.\n",
543
  "\n",
544
  "After pulling the full tool list for reference, we keep only `read_text_file` and hand that to the agent. Trimming the surface is the cheapest guardrail there is: fewer tools = fewer things the LLM can get wrong."
545
- ],
546
- "id": "25a4c631"
547
  },
548
  {
549
  "cell_type": "code",
 
 
550
  "metadata": {},
 
551
  "source": [
552
  "from langchain_mcp_adapters.client import MultiServerMCPClient\n",
553
  "\n",
@@ -565,13 +568,11 @@
565
  "# Narrow the surface: only read_text_file. Discovery is done in the prompt.\n",
566
  "kb_tools = [t for t in all_kb_tools if t.name == \"read_text_file\"]\n",
567
  "print(f\"Exposing {len(kb_tools)} tool(s): {[t.name for t in kb_tools]}\")"
568
- ],
569
- "execution_count": null,
570
- "outputs": [],
571
- "id": "52cc53df"
572
  },
573
  {
574
  "cell_type": "markdown",
 
575
  "metadata": {},
576
  "source": [
577
  "### A small KB-lookup helper agent\n",
@@ -584,12 +585,14 @@
584
  "- **The LLM only ever passes a basename.** Combined with the server's CWD being the KB folder, that means paths can't drift outside the sandbox — no path sanitizer needed.\n",
585
  "\n",
586
  "The prompt also pins a sentinel (`NO_RELEVANT_ARTICLE`) so the downstream node can branch cleanly on a hit vs. a miss."
587
- ],
588
- "id": "97ad3345"
589
  },
590
  {
591
  "cell_type": "code",
 
 
592
  "metadata": {},
 
593
  "source": [
594
  "from langchain.agents import create_agent\n",
595
  "\n",
@@ -613,13 +616,11 @@
613
  "what you guess the customer needs — the downstream node will pick the right variant.\"\"\"\n",
614
  "\n",
615
  "kb_lookup_agent = create_agent(llm, kb_tools, system_prompt=kb_lookup_prompt)"
616
- ],
617
- "execution_count": null,
618
- "outputs": [],
619
- "id": "1a05c23a"
620
  },
621
  {
622
  "cell_type": "markdown",
 
623
  "metadata": {},
624
  "source": [
625
  "### Plug it into the triage graph\n",
@@ -631,12 +632,14 @@
631
  "```\n",
632
  "\n",
633
  "The escalation path is unchanged — human agents don't need the auto-grounding step. We also redefine `generate_response` as `generate_grounded_response`, which reads `kb_context` and tells the LLM to quote from it. Compile the new graph and render the mermaid — you'll see the extra node slotted in."
634
- ],
635
- "id": "bf2a23fe"
636
  },
637
  {
638
  "cell_type": "code",
 
 
639
  "metadata": {},
 
640
  "source": [
641
  "from langgraph.checkpoint.memory import MemorySaver\n",
642
  "\n",
@@ -825,13 +828,11 @@
825
  "\n",
826
  "support_chain_v2 = workflow_v2.compile(checkpointer=MemorySaver())\n",
827
  "display(Image(support_chain_v2.get_graph().draw_mermaid_png()))"
828
- ],
829
- "execution_count": null,
830
- "outputs": [],
831
- "id": "ceaf62b6"
832
  },
833
  {
834
  "cell_type": "markdown",
 
835
  "metadata": {},
836
  "source": [
837
  "### Streaming the extended graph\n",
@@ -839,12 +840,14 @@
839
  "Same `stream_mode=[\"updates\", \"messages\"]` trick from Section 1. Now the `updates` stream also fires for `fetch_kb_context` — proof the new node ran — and we filter `messages` to `polish_response` so only the final reply tokens render inline.\n",
840
  "\n",
841
  "The KB helper's single `read_text_file` call happens inside `fetch_kb_context`; we don't surface it in the outer stream here, but you could by emitting `get_stream_writer()` events from the node (Mode 4 from Section 1)."
842
- ],
843
- "id": "238049af"
844
  },
845
  {
846
  "cell_type": "code",
 
 
847
  "metadata": {},
 
848
  "source": [
849
  "from langchain_core.messages import AIMessageChunk\n",
850
  "import uuid\n",
@@ -890,13 +893,11 @@
890
  " printed_reply_header = True\n",
891
  " print(chunk.content, end=\"\", flush=True)\n",
892
  "print()"
893
- ],
894
- "execution_count": null,
895
- "outputs": [],
896
- "id": "00ad5451"
897
  },
898
  {
899
  "cell_type": "markdown",
 
900
  "metadata": {},
901
  "source": [
902
  "### What to notice\n",
@@ -906,11 +907,11 @@
906
  "- **The sandbox is the config.** The MCP server only sees the directory you named in `args`, and with `cwd` pointed at the same folder, bare basenames resolve correctly. Different agent, different folder — no code changes.\n",
907
  "- **Grounding beats general knowledge.** Quotes from `billing.md` in the final reply are auditable; invented policy isn't.\n",
908
  "- **Scales naturally.** Swap the markdown folder for a vector-store MCP (e.g. Chroma) once the KB outgrows keyword lookups — the node's code barely changes."
909
- ],
910
- "id": "a8c27f08"
911
  },
912
  {
913
  "cell_type": "markdown",
 
914
  "metadata": {},
915
  "source": [
916
  "---\n",
@@ -926,35 +927,35 @@
926
  "- **Native streaming**: if your function is an `async` generator that yields partial responses, Gradio renders each yield as an in-place update\n",
927
  "\n",
928
  "We'll wire the **KB-grounded triage graph** from Section 2 (`support_chain_v2`) into a `ChatInterface`. Each message the user types becomes a fresh ticket; the graph runs end-to-end (classify → fetch_kb_context → generate → polish) and the stream surfaces node progress and polished tokens."
929
- ],
930
- "id": "25ed67ff"
931
  },
932
  {
933
  "cell_type": "markdown",
 
934
  "metadata": {},
935
  "source": [
936
  "### Make sure Gradio is installed\n",
937
  "\n",
938
  "Already added to `pyproject.toml` — the line below is a safety net in case you're on an older sync."
939
- ],
940
- "id": "e7ffdf12"
941
  },
942
  {
943
  "cell_type": "code",
 
 
944
  "metadata": {},
 
945
  "source": [
946
  "import importlib, subprocess, sys\n",
947
  "if importlib.util.find_spec(\"gradio\") is None:\n",
948
  " subprocess.check_call([sys.executable, \"-m\", \"pip\", \"install\", \"-q\", \"gradio\"])\n",
949
  "import gradio as gr\n",
950
  "print(\"gradio version:\", gr.__version__)"
951
- ],
952
- "execution_count": null,
953
- "outputs": [],
954
- "id": "a67347ff"
955
  },
956
  {
957
  "cell_type": "markdown",
 
958
  "metadata": {},
959
  "source": [
960
  "### The streaming response function\n",
@@ -962,12 +963,14 @@
962
  "`gr.ChatInterface` expects an `async def` generator with signature `(message, history, request) -> AsyncIterator[str]`. We accumulate tokens in a **buffer** and `yield` the running total on every chunk — Gradio replaces the message with each yield, so yielding deltas would flicker.\n",
963
  "\n",
964
  "We derive `thread_id` from `request.session_hash`, which Gradio mints once per browser session. Combined with the graph's `MemorySaver` checkpointer and the new `conversation` field on `TicketStateV2`, that's what gives us multi-turn memory: each session gets an isolated thread, `generate_grounded_response` sees prior turns, `polish_and_log` appends the current one."
965
- ],
966
- "id": "02f11d8f"
967
  },
968
  {
969
  "cell_type": "code",
 
 
970
  "metadata": {},
 
971
  "source": [
972
  "async def respond(message: str, history: list[tuple[str, str]], request: gr.Request):\n",
973
  " ticket = {\n",
@@ -1012,13 +1015,11 @@
1012
  " streaming_reply = True\n",
1013
  " reply_buffer += chunk.content\n",
1014
  " yield reply_buffer"
1015
- ],
1016
- "execution_count": null,
1017
- "outputs": [],
1018
- "id": "a3614a3b"
1019
  },
1020
  {
1021
  "cell_type": "markdown",
 
1022
  "metadata": {},
1023
  "source": [
1024
  "### Launch the app\n",
@@ -1026,12 +1027,14 @@
1026
  "`demo.launch()` starts a local server and prints a URL. Open it in a browser, paste in a support ticket, and watch the agent consult the KB live.\n",
1027
  "\n",
1028
  "> To stop the server, run `demo.close()` in a later cell."
1029
- ],
1030
- "id": "a7bb9501"
1031
  },
1032
  {
1033
  "cell_type": "code",
 
 
1034
  "metadata": {},
 
1035
  "source": [
1036
  "demo = gr.ChatInterface(\n",
1037
  " fn=respond,\n",
@@ -1045,13 +1048,11 @@
1045
  ")\n",
1046
  "\n",
1047
  "demo.launch(inline=False, share=False)"
1048
- ],
1049
- "execution_count": null,
1050
- "outputs": [],
1051
- "id": "cd4df22f"
1052
  },
1053
  {
1054
  "cell_type": "markdown",
 
1055
  "metadata": {},
1056
  "source": [
1057
  "### What to notice\n",
@@ -1060,11 +1061,11 @@
1060
  "- **The reply wipes the last status.** The moment polish_response tokens arrive, we stop yielding status markers and start yielding the accumulated reply buffer. That buffer replaces whatever status was showing, so the final UI state is just the clean reply.\n",
1061
  "- **Session-scoped memory.** `request.session_hash` drives the `thread_id`, so each browser session gets its own thread in the `MemorySaver` checkpointer. Open a second tab and you start a fresh conversation — no leakage.\n",
1062
  "- **History accumulates via reducer.** `TicketStateV2.conversation` uses `Annotated[list, operator.add]`. `polish_and_log` appends `{user, assistant}` each turn; `generate_grounded_response` reads the list on the next turn so the prompt sees the full thread."
1063
- ],
1064
- "id": "f2ea03da"
1065
  },
1066
  {
1067
  "cell_type": "markdown",
 
1068
  "metadata": {},
1069
  "source": [
1070
  "---\n",
@@ -1089,11 +1090,11 @@
1089
  "> → write requirements.txt + packages.txt → gradio deploy\n",
1090
  "> → set Space secrets → test the live URL\n",
1091
  "> ```"
1092
- ],
1093
- "id": "13f3f73f"
1094
  },
1095
  {
1096
  "cell_type": "markdown",
 
1097
  "metadata": {},
1098
  "source": [
1099
  "### 4.1 — Create a Hugging Face account\n",
@@ -1115,11 +1116,11 @@
1115
  " - **Type / Role**: choose **\"Write\"** — you need write access to create and update Spaces. A \"Read\" token is not enough.\n",
1116
  "4. Click **Create Token**. Copy the token (it starts with `hf_...`) **right now** — HF only shows it once. If you close the page without copying, you'll have to create a new one.\n",
1117
  "5. Paste it somewhere safe temporarily (a password manager is ideal). **Do not commit it to git or paste it into a notebook cell** — tokens in notebooks tend to leak into screenshots and commit history."
1118
- ],
1119
- "id": "0e74c0d3"
1120
  },
1121
  {
1122
  "cell_type": "markdown",
 
1123
  "metadata": {},
1124
  "source": [
1125
  "### 4.3 — Install the CLI tools and log in\n",
@@ -1148,11 +1149,11 @@
1148
  "Success is confirmed when the CLI reports that your token was saved (typically under `~/.cache/huggingface/token`).\n",
1149
  "\n",
1150
  "> If you ever need to revoke access (e.g. laptop stolen), go back to the tokens page on the HF website and delete that named token — it invalidates immediately across all your machines."
1151
- ],
1152
- "id": "7bd9a531"
1153
  },
1154
  {
1155
  "cell_type": "markdown",
 
1156
  "metadata": {},
1157
  "source": [
1158
  "### 4.4 — Prepare the deployable app\n",
@@ -1169,29 +1170,16 @@
1169
  "- `demo.launch()` takes no arguments — HF injects `GRADIO_SERVER_NAME=0.0.0.0` and the right port via env vars, so default launch works.\n",
1170
  "\n",
1171
  "The cell below is just a sanity check — it confirms `app.py` is present in your working directory before you try to deploy."
1172
- ],
1173
- "id": "63f79ba3"
1174
  },
1175
  {
1176
  "cell_type": "code",
1177
- "metadata": {},
1178
- "source": [
1179
- "from pathlib import Path\n",
1180
- "\n",
1181
- "app_file = Path(\"app.py\")\n",
1182
- "assert app_file.exists(), (\n",
1183
- " \"app.py not found in the current directory. It ships alongside this notebook — \"\n",
1184
- " \"make sure you're running from the repo root.\"\n",
1185
- ")\n",
1186
- "size = app_file.stat().st_size\n",
1187
- "lines = app_file.read_text().count(\"\\n\") + 1\n",
1188
- "print(f\"app.py found: {size:,} bytes, {lines} lines\")\n",
1189
- "print(\"\\nFirst 15 lines:\\n\")\n",
1190
- "print(\"\\n\".join(app_file.read_text().splitlines()[:15]))"
1191
- ],
1192
  "execution_count": 27,
 
 
1193
  "outputs": [
1194
  {
 
1195
  "output_type": "stream",
1196
  "text": [
1197
  "app.py found: 15,310 bytes, 406 lines\n",
@@ -1216,44 +1204,43 @@
1216
  ]
1217
  }
1218
  ],
1219
- "id": "9a69bead"
 
 
 
 
 
 
 
 
 
 
 
 
 
1220
  },
1221
  {
1222
  "cell_type": "markdown",
 
1223
  "metadata": {},
1224
  "source": [
1225
  "### 4.5 — Declare Python dependencies\n",
1226
  "\n",
1227
  "HF Spaces uses `requirements.txt` (one package per line) to rebuild your environment. The `pyproject.toml` for this repo carries packages we used across *all three* notebooks — we only need the subset that `app.py` actually imports.\n",
1228
  "\n",
 
 
1229
  "Run the cell below — it writes a minimal `requirements.txt` next to `app.py`."
1230
- ],
1231
- "id": "87ad3526"
1232
  },
1233
  {
1234
  "cell_type": "code",
1235
- "metadata": {},
1236
- "source": [
1237
- "requirements_txt = \"\"\"gradio>=5.0.0\n",
1238
- "langchain==1.2.11\n",
1239
- "langchain-openai==1.1.12\n",
1240
- "langchain-community>=0.4.1\n",
1241
- "langchain-mcp-adapters==0.1.14\n",
1242
- "langgraph>=1.1.2\n",
1243
- "mcp==1.15.0\n",
1244
- "pydantic==2.12.5\n",
1245
- "python-dotenv>=1.2.1\n",
1246
- "\"\"\"\n",
1247
- "\n",
1248
- "with open(\"requirements.txt\", \"w\") as f:\n",
1249
- " f.write(requirements_txt)\n",
1250
- "\n",
1251
- "print(\"Wrote requirements.txt:\")\n",
1252
- "print(requirements_txt)"
1253
- ],
1254
  "execution_count": 28,
 
 
1255
  "outputs": [
1256
  {
 
1257
  "output_type": "stream",
1258
  "text": [
1259
  "Wrote requirements.txt:\n",
@@ -1270,10 +1257,28 @@
1270
  ]
1271
  }
1272
  ],
1273
- "id": "729400ed"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1274
  },
1275
  {
1276
  "cell_type": "markdown",
 
1277
  "metadata": {},
1278
  "source": [
1279
  "### 4.6 — Declare system packages (Node.js for MCP)\n",
@@ -1281,26 +1286,16 @@
1281
  "Our app uses the **Filesystem MCP server**, which runs under Node via `npx`. The default Python Space image doesn't have Node. HF lets you install **system packages** (anything available via `apt-get`) by listing them in a `packages.txt` file at the project root.\n",
1282
  "\n",
1283
  "The cell below writes `packages.txt` with the two entries we need. Run it, then verify the file is in your project root."
1284
- ],
1285
- "id": "52da9fe7"
1286
  },
1287
  {
1288
  "cell_type": "code",
1289
- "metadata": {},
1290
- "source": [
1291
- "packages_txt = \"\"\"nodejs\n",
1292
- "npm\n",
1293
- "\"\"\"\n",
1294
- "\n",
1295
- "with open(\"packages.txt\", \"w\") as f:\n",
1296
- " f.write(packages_txt)\n",
1297
- "\n",
1298
- "print(\"Wrote packages.txt:\")\n",
1299
- "print(packages_txt)"
1300
- ],
1301
  "execution_count": 29,
 
 
1302
  "outputs": [
1303
  {
 
1304
  "output_type": "stream",
1305
  "text": [
1306
  "Wrote packages.txt:\n",
@@ -1310,10 +1305,21 @@
1310
  ]
1311
  }
1312
  ],
1313
- "id": "d0d3edab"
 
 
 
 
 
 
 
 
 
 
1314
  },
1315
  {
1316
  "cell_type": "markdown",
 
1317
  "metadata": {},
1318
  "source": [
1319
  "### 4.7 — Keep secrets out of the repo\n",
@@ -1334,11 +1340,11 @@
1334
  "The `gradio deploy` CLI respects `.gitignore`, so excluded files will not be uploaded. Double-check by running `git status --ignored` — your `.env` should appear under \"Ignored files\".\n",
1335
  "\n",
1336
  "> **Rule of thumb**: if the value would embarrass you on a billboard, it's a secret. API keys, endpoints, connection strings, tenant IDs — all secrets. These go into **Space secrets** (step 4.9), never into the repo."
1337
- ],
1338
- "id": "14a3f503"
1339
  },
1340
  {
1341
  "cell_type": "markdown",
 
1342
  "metadata": {},
1343
  "source": [
1344
  "### 4.8 — Run `gradio deploy`\n",
@@ -1371,11 +1377,11 @@
1371
  "> The first build takes **3–10 minutes** — HF has to install Python packages *and* Node. Watch progress in the **Logs → Build** tab on the Space page. You'll see pip installs, then apt installs for `nodejs`, then your app starting up.\n",
1372
  "\n",
1373
  "If anything looks wrong mid-build, it's fine to let it finish and iterate — pushes are fast after the first time because layers are cached."
1374
- ],
1375
- "id": "5c5b3dd5"
1376
  },
1377
  {
1378
  "cell_type": "markdown",
 
1379
  "metadata": {},
1380
  "source": [
1381
  "### 4.9 — Configure Space secrets\n",
@@ -1394,11 +1400,11 @@
1394
  "After saving all four, click **Restart this Space** (or **Factory reboot** for a full rebuild). The Space re-reads env vars on every restart; without the restart, your app is still running with `None` values and Azure calls will fail with authentication errors.\n",
1395
  "\n",
1396
  "> **Good hygiene**: each deployment env gets its own Azure deployment / key. Don't reuse your laptop `.env` values for production — rotate to a fresh set."
1397
- ],
1398
- "id": "97cb1558"
1399
  },
1400
  {
1401
  "cell_type": "markdown",
 
1402
  "metadata": {},
1403
  "source": [
1404
  "### 4.10 — Verify and iterate\n",
@@ -1423,11 +1429,11 @@
1423
  "HF auto-rebuilds the Space on every push — you'll see the build progress under the **Logs → Build** tab.\n",
1424
  "\n",
1425
  "You do **not** need to re-run `uv run gradio deploy` on subsequent updates. It's only for the initial create. Editing files in the HF web UI also works and triggers a rebuild."
1426
- ],
1427
- "id": "c560a9d9"
1428
  },
1429
  {
1430
  "cell_type": "markdown",
 
1431
  "metadata": {},
1432
  "source": [
1433
  "### 4.11 — Troubleshooting\n",
@@ -1445,11 +1451,11 @@
1445
  "| Want to pause the Space to stop accruing quota | Space is using CPU time you don't need right now | Settings → **Pause Space**. It won't serve traffic but your code + secrets are preserved. |\n",
1446
  "\n",
1447
  "**If you get truly stuck**, delete the Space (Settings → Delete this Space) and run `uv run gradio deploy` again from a clean directory. Spaces are cheap — starting over is often faster than debugging a bad state."
1448
- ],
1449
- "id": "473c6e08"
1450
  },
1451
  {
1452
  "cell_type": "markdown",
 
1453
  "metadata": {},
1454
  "source": [
1455
  "---\n",
@@ -1472,8 +1478,7 @@
1472
  "- **Observability** — pipe the same stream into Langfuse (Part 2's monitoring section) for latency + token breakdowns\n",
1473
  "\n",
1474
  "With these you've closed the loop: a LangGraph workflow → a KB-grounded tool-using agent → a live web UI, all streaming."
1475
- ],
1476
- "id": "88d623f2"
1477
  }
1478
  ],
1479
  "metadata": {
@@ -1497,4 +1502,4 @@
1497
  },
1498
  "nbformat": 4,
1499
  "nbformat_minor": 5
1500
- }
 
2
  "cells": [
3
  {
4
  "cell_type": "markdown",
5
+ "id": "9c686d08",
6
  "metadata": {},
7
  "source": [
8
  "# KT1 Part 3 — Streaming, Filesystem MCP, and Gradio\n",
 
18
  "| **Gradio** | Wrap the streaming agent in a chat UI you can share with a URL |\n",
19
  "\n",
20
  "By the end you'll have a KB-grounded customer support assistant running in a local Gradio app, streaming its reasoning and tool calls token-by-token."
21
+ ]
 
22
  },
23
  {
24
  "cell_type": "markdown",
 
33
  },
34
  {
35
  "cell_type": "code",
36
+ "execution_count": null,
37
+ "id": "f92d6179",
38
  "metadata": {},
39
+ "outputs": [],
40
  "source": [
41
  "import os\n",
42
  "import warnings\n",
 
54
  " azure_deployment=os.getenv(\"AZURE_OPENAI_CHAT_DEPLOYMENT_NAME\"),\n",
55
  " api_version=os.getenv(\"AZURE_OPENAI_API_VERSION\"),\n",
56
  ")"
57
+ ]
 
 
 
58
  },
59
  {
60
  "cell_type": "markdown",
61
+ "id": "8991899e",
62
  "metadata": {},
63
  "source": [
64
  "---\n",
 
79
  "| `custom` | Whatever you emit via `get_stream_writer()` | You want to surface app-specific progress events |\n",
80
  "\n",
81
  "You can combine modes: `stream_mode=[\"updates\", \"messages\"]` yields `(mode_name, event)` tuples and you route by mode."
82
+ ]
 
83
  },
84
  {
85
  "cell_type": "markdown",
86
+ "id": "4918396a",
87
  "metadata": {},
88
  "source": [
89
  "### A workflow to stream\n",
 
91
  "To keep the focus on streaming, we'll reuse the **support-ticket triage** workflow from Part 2 Pattern 1. The demo ticket below routes down the auto-respond path — three LLM calls (classify → generate → polish) gives us plenty of events to observe.\n",
92
  "\n",
93
  "If you already ran Part 2, this is the same shape — skim it."
94
+ ]
 
95
  },
96
  {
97
  "cell_type": "code",
98
+ "execution_count": null,
99
+ "id": "795c4f3d",
100
  "metadata": {},
101
+ "outputs": [],
102
  "source": [
103
  "from typing_extensions import TypedDict\n",
104
  "from typing import Annotated, Literal\n",
 
230
  "\n",
231
  "support_chain = workflow.compile()\n",
232
  "display(Image(support_chain.get_graph().draw_mermaid_png()))"
233
+ ]
 
 
 
234
  },
235
  {
236
  "cell_type": "markdown",
237
+ "id": "bb33e148",
238
  "metadata": {},
239
  "source": [
240
  "### Mode 1 — `updates`: node-level progress\n",
241
  "\n",
242
  "Yields one event per node, containing only what that node **changed**. The simplest mode — a clean \"step by step\" feed."
243
+ ]
 
244
  },
245
  {
246
  "cell_type": "code",
247
+ "execution_count": null,
248
+ "id": "d2f79113",
249
  "metadata": {},
250
+ "outputs": [],
251
  "source": [
252
  "ticket = {\n",
253
  " \"customer_message\": \"My subscription was double-charged this month — please refund the extra charge.\",\n",
 
259
  " for node, delta in event.items():\n",
260
  " keys = list(delta.keys())\n",
261
  " print(f\" {node:10s} -> updated {keys}\")"
262
+ ]
 
 
 
263
  },
264
  {
265
  "cell_type": "markdown",
266
+ "id": "d1a0556e",
267
  "metadata": {},
268
  "source": [
269
  "### Mode 2 — `values`: cumulative state snapshots\n",
270
  "\n",
271
  "One event per checkpoint, each containing the **full state**. You get one snapshot for the initial input plus one after every node runs — so on our 3-node happy path, that's 4 snapshots. Useful when you want to render the latest known values to a UI without tracking deltas yourself."
272
+ ]
 
273
  },
274
  {
275
  "cell_type": "code",
276
+ "execution_count": null,
277
+ "id": "bf7b3c36",
278
  "metadata": {},
279
+ "outputs": [],
280
  "source": [
281
  "print(\"Streaming 'values' mode:\\n\")\n",
282
  "for i, snapshot in enumerate(support_chain.stream(ticket, stream_mode=\"values\")):\n",
283
  " filled = [k for k, v in snapshot.items() if v not in (None, \"\", [], {})]\n",
284
  " print(f\" step {i}: filled fields = {filled}\")"
285
+ ]
 
 
 
286
  },
287
  {
288
  "cell_type": "markdown",
289
+ "id": "8bf9bad5",
290
  "metadata": {},
291
  "source": [
292
  "### Mode 3 — `messages`: token-by-token streaming\n",
 
294
  "This is the one you want for ChatGPT-style UX. Each event is a `(chunk, metadata)` tuple where `chunk` is an `AIMessageChunk` holding a few tokens. `metadata` tells you which **node** emitted the chunk — so you can filter to just the node whose output you want to render.\n",
295
  "\n",
296
  "Use the **async** variant (`astream`) — token streaming is async-native on the SDK side."
297
+ ]
 
298
  },
299
  {
300
  "cell_type": "code",
301
+ "execution_count": null,
302
+ "id": "eb480690",
303
  "metadata": {},
304
+ "outputs": [],
305
  "source": [
306
  "print(\"Streaming 'messages' mode — tokens from the 'polish_response' node only:\\n\")\n",
307
  "\n",
 
309
  " if metadata.get(\"langgraph_node\") == \"polish_response\" and chunk.content:\n",
310
  " print(chunk.content, end=\"\", flush=True)\n",
311
  "print()"
312
+ ]
 
 
 
313
  },
314
  {
315
  "cell_type": "markdown",
316
+ "id": "75bd4132",
317
  "metadata": {},
318
  "source": [
319
  "### Mode 4 — `custom`: emit your own events\n",
320
  "\n",
321
  "Sometimes a node does meaningful work **between** LLM calls — a DB query, a long loop, a parse — and you want the UI to know. Call `get_stream_writer()` inside the node to push a custom event; the caller receives it in `stream_mode=\"custom\"`."
322
+ ]
 
323
  },
324
  {
325
  "cell_type": "code",
326
+ "execution_count": null,
327
+ "id": "9c14989f",
328
  "metadata": {},
329
+ "outputs": [],
330
  "source": [
331
  "from langgraph.config import get_stream_writer\n",
332
  "\n",
 
357
  "print(\"Streaming 'custom' events:\\n\")\n",
358
  "for event in demo_chain.stream(ticket, stream_mode=\"custom\"):\n",
359
  " print(f\" {event}\")"
360
+ ]
 
 
 
361
  },
362
  {
363
  "cell_type": "markdown",
364
+ "id": "c29456c8",
365
  "metadata": {},
366
  "source": [
367
  "### What to notice\n",
 
370
  "- **Modes are cheap to combine.** `stream_mode=[\"updates\", \"messages\"]` returns `(mode, event)` tuples — drive multiple UI elements from one subscription.\n",
371
  "- **`astream` pairs with `messages`.** Token streaming is async on the SDK side; use the async variant so tokens flow promptly.\n",
372
  "- **`get_stream_writer()` is the escape hatch.** Anything that isn't an LLM token or a node boundary — progress text, tool latency, partial parses — belongs in `custom`."
373
+ ]
 
374
  },
375
  {
376
  "cell_type": "markdown",
377
+ "id": "04f00e37",
378
  "metadata": {},
379
  "source": [
380
  "---\n",
 
403
  "| `list_allowed_directories` | Show which paths this server is permitted to access |\n",
404
  "\n",
405
  "We'll only wire `read_text_file` to the agent — the others aren't needed once the file inventory is baked into the prompt. Like Playwright, the server ships as a Node process you run with `npx` — no Python package to install."
406
+ ]
 
407
  },
408
  {
409
  "cell_type": "markdown",
410
+ "id": "eb4331ea",
411
  "metadata": {},
412
  "source": [
413
  "### One-time install\n",
 
420
  "```\n",
421
  "\n",
422
  "That's it. The first time `MultiServerMCPClient` connects, `npx` fetches and caches the server package — no separate install step."
423
+ ]
 
424
  },
425
  {
426
  "cell_type": "markdown",
427
+ "id": "e6d91476",
428
  "metadata": {},
429
  "source": [
430
  "### Build a mini knowledge base\n",
431
  "\n",
432
  "Three small markdown files covering the ticket types from Section 1: billing, password/access, shipping. In a real deployment this would be hundreds of articles versioned in git — the MCP boundary stays identical."
433
+ ]
 
434
  },
435
  {
436
  "cell_type": "code",
437
+ "execution_count": null,
438
+ "id": "21f7349f",
439
  "metadata": {},
440
+ "outputs": [],
441
  "source": [
442
  "from pathlib import Path\n",
443
  "\n",
 
500
  "print(\"KB files in\", kb_dir.resolve(), \":\")\n",
501
  "for f in sorted(kb_dir.iterdir()):\n",
502
  " print(f\" - {f.name} ({f.stat().st_size} bytes)\")"
503
+ ]
 
 
 
504
  },
505
  {
506
  "cell_type": "markdown",
507
+ "id": "20f8f26b",
508
  "metadata": {},
509
  "source": [
510
  "### The notebook compatibility shim (same as Part 2)\n",
511
  "\n",
512
  "MCP's stdio transport expects real file descriptors, which Jupyter's captured streams don't always provide. This shim gets us past that — only needed inside notebooks."
513
+ ]
 
514
  },
515
  {
516
  "cell_type": "code",
517
+ "execution_count": null,
518
+ "id": "7f9070ed",
519
  "metadata": {},
520
+ "outputs": [],
521
  "source": [
522
  "import sys, io\n",
523
  "\n",
 
531
  " stream.fileno()\n",
532
  " except (AttributeError, io.UnsupportedOperation):\n",
533
  " stream.fileno = _safe_fileno"
534
+ ]
 
 
 
535
  },
536
  {
537
  "cell_type": "markdown",
538
+ "id": "25a4c631",
539
  "metadata": {},
540
  "source": [
541
  "### Connect and list the tools\n",
 
543
  "We pass the absolute path to `support_kb/` as the one allowed directory, and set the server's `cwd` to the same path so bare basenames (e.g. `\"billing.md\"`) resolve correctly inside the sandbox. The server **refuses any read outside that path** — a simple but real sandbox; if the agent ever tried to read `/etc/passwd`, the call would error at the MCP boundary rather than in our Python code.\n",
544
  "\n",
545
  "After pulling the full tool list for reference, we keep only `read_text_file` and hand that to the agent. Trimming the surface is the cheapest guardrail there is: fewer tools = fewer things the LLM can get wrong."
546
+ ]
 
547
  },
548
  {
549
  "cell_type": "code",
550
+ "execution_count": null,
551
+ "id": "52cc53df",
552
  "metadata": {},
553
+ "outputs": [],
554
  "source": [
555
  "from langchain_mcp_adapters.client import MultiServerMCPClient\n",
556
  "\n",
 
568
  "# Narrow the surface: only read_text_file. Discovery is done in the prompt.\n",
569
  "kb_tools = [t for t in all_kb_tools if t.name == \"read_text_file\"]\n",
570
  "print(f\"Exposing {len(kb_tools)} tool(s): {[t.name for t in kb_tools]}\")"
571
+ ]
 
 
 
572
  },
573
  {
574
  "cell_type": "markdown",
575
+ "id": "97ad3345",
576
  "metadata": {},
577
  "source": [
578
  "### A small KB-lookup helper agent\n",
 
585
  "- **The LLM only ever passes a basename.** Combined with the server's CWD being the KB folder, that means paths can't drift outside the sandbox — no path sanitizer needed.\n",
586
  "\n",
587
  "The prompt also pins a sentinel (`NO_RELEVANT_ARTICLE`) so the downstream node can branch cleanly on a hit vs. a miss."
588
+ ]
 
589
  },
590
  {
591
  "cell_type": "code",
592
+ "execution_count": null,
593
+ "id": "1a05c23a",
594
  "metadata": {},
595
+ "outputs": [],
596
  "source": [
597
  "from langchain.agents import create_agent\n",
598
  "\n",
 
616
  "what you guess the customer needs — the downstream node will pick the right variant.\"\"\"\n",
617
  "\n",
618
  "kb_lookup_agent = create_agent(llm, kb_tools, system_prompt=kb_lookup_prompt)"
619
+ ]
 
 
 
620
  },
621
  {
622
  "cell_type": "markdown",
623
+ "id": "bf2a23fe",
624
  "metadata": {},
625
  "source": [
626
  "### Plug it into the triage graph\n",
 
632
  "```\n",
633
  "\n",
634
  "The escalation path is unchanged — human agents don't need the auto-grounding step. We also redefine `generate_response` as `generate_grounded_response`, which reads `kb_context` and tells the LLM to quote from it. Compile the new graph and render the mermaid — you'll see the extra node slotted in."
635
+ ]
 
636
  },
637
  {
638
  "cell_type": "code",
639
+ "execution_count": null,
640
+ "id": "ceaf62b6",
641
  "metadata": {},
642
+ "outputs": [],
643
  "source": [
644
  "from langgraph.checkpoint.memory import MemorySaver\n",
645
  "\n",
 
828
  "\n",
829
  "support_chain_v2 = workflow_v2.compile(checkpointer=MemorySaver())\n",
830
  "display(Image(support_chain_v2.get_graph().draw_mermaid_png()))"
831
+ ]
 
 
 
832
  },
833
  {
834
  "cell_type": "markdown",
835
+ "id": "238049af",
836
  "metadata": {},
837
  "source": [
838
  "### Streaming the extended graph\n",
 
840
  "Same `stream_mode=[\"updates\", \"messages\"]` trick from Section 1. Now the `updates` stream also fires for `fetch_kb_context` — proof the new node ran — and we filter `messages` to `polish_response` so only the final reply tokens render inline.\n",
841
  "\n",
842
  "The KB helper's single `read_text_file` call happens inside `fetch_kb_context`; we don't surface it in the outer stream here, but you could by emitting `get_stream_writer()` events from the node (Mode 4 from Section 1)."
843
+ ]
 
844
  },
845
  {
846
  "cell_type": "code",
847
+ "execution_count": null,
848
+ "id": "00ad5451",
849
  "metadata": {},
850
+ "outputs": [],
851
  "source": [
852
  "from langchain_core.messages import AIMessageChunk\n",
853
  "import uuid\n",
 
893
  " printed_reply_header = True\n",
894
  " print(chunk.content, end=\"\", flush=True)\n",
895
  "print()"
896
+ ]
 
 
 
897
  },
898
  {
899
  "cell_type": "markdown",
900
+ "id": "a8c27f08",
901
  "metadata": {},
902
  "source": [
903
  "### What to notice\n",
 
907
  "- **The sandbox is the config.** The MCP server only sees the directory you named in `args`, and with `cwd` pointed at the same folder, bare basenames resolve correctly. Different agent, different folder — no code changes.\n",
908
  "- **Grounding beats general knowledge.** Quotes from `billing.md` in the final reply are auditable; invented policy isn't.\n",
909
  "- **Scales naturally.** Swap the markdown folder for a vector-store MCP (e.g. Chroma) once the KB outgrows keyword lookups — the node's code barely changes."
910
+ ]
 
911
  },
912
  {
913
  "cell_type": "markdown",
914
+ "id": "25ed67ff",
915
  "metadata": {},
916
  "source": [
917
  "---\n",
 
927
  "- **Native streaming**: if your function is an `async` generator that yields partial responses, Gradio renders each yield as an in-place update\n",
928
  "\n",
929
  "We'll wire the **KB-grounded triage graph** from Section 2 (`support_chain_v2`) into a `ChatInterface`. Each message the user types becomes a fresh ticket; the graph runs end-to-end (classify → fetch_kb_context → generate → polish) and the stream surfaces node progress and polished tokens."
930
+ ]
 
931
  },
932
  {
933
  "cell_type": "markdown",
934
+ "id": "e7ffdf12",
935
  "metadata": {},
936
  "source": [
937
  "### Make sure Gradio is installed\n",
938
  "\n",
939
  "Already added to `pyproject.toml` — the line below is a safety net in case you're on an older sync."
940
+ ]
 
941
  },
942
  {
943
  "cell_type": "code",
944
+ "execution_count": null,
945
+ "id": "a67347ff",
946
  "metadata": {},
947
+ "outputs": [],
948
  "source": [
949
  "import importlib, subprocess, sys\n",
950
  "if importlib.util.find_spec(\"gradio\") is None:\n",
951
  " subprocess.check_call([sys.executable, \"-m\", \"pip\", \"install\", \"-q\", \"gradio\"])\n",
952
  "import gradio as gr\n",
953
  "print(\"gradio version:\", gr.__version__)"
954
+ ]
 
 
 
955
  },
956
  {
957
  "cell_type": "markdown",
958
+ "id": "02f11d8f",
959
  "metadata": {},
960
  "source": [
961
  "### The streaming response function\n",
 
963
  "`gr.ChatInterface` expects an `async def` generator with signature `(message, history, request) -> AsyncIterator[str]`. We accumulate tokens in a **buffer** and `yield` the running total on every chunk — Gradio replaces the message with each yield, so yielding deltas would flicker.\n",
964
  "\n",
965
  "We derive `thread_id` from `request.session_hash`, which Gradio mints once per browser session. Combined with the graph's `MemorySaver` checkpointer and the new `conversation` field on `TicketStateV2`, that's what gives us multi-turn memory: each session gets an isolated thread, `generate_grounded_response` sees prior turns, `polish_and_log` appends the current one."
966
+ ]
 
967
  },
968
  {
969
  "cell_type": "code",
970
+ "execution_count": null,
971
+ "id": "a3614a3b",
972
  "metadata": {},
973
+ "outputs": [],
974
  "source": [
975
  "async def respond(message: str, history: list[tuple[str, str]], request: gr.Request):\n",
976
  " ticket = {\n",
 
1015
  " streaming_reply = True\n",
1016
  " reply_buffer += chunk.content\n",
1017
  " yield reply_buffer"
1018
+ ]
 
 
 
1019
  },
1020
  {
1021
  "cell_type": "markdown",
1022
+ "id": "a7bb9501",
1023
  "metadata": {},
1024
  "source": [
1025
  "### Launch the app\n",
 
1027
  "`demo.launch()` starts a local server and prints a URL. Open it in a browser, paste in a support ticket, and watch the agent consult the KB live.\n",
1028
  "\n",
1029
  "> To stop the server, run `demo.close()` in a later cell."
1030
+ ]
 
1031
  },
1032
  {
1033
  "cell_type": "code",
1034
+ "execution_count": null,
1035
+ "id": "cd4df22f",
1036
  "metadata": {},
1037
+ "outputs": [],
1038
  "source": [
1039
  "demo = gr.ChatInterface(\n",
1040
  " fn=respond,\n",
 
1048
  ")\n",
1049
  "\n",
1050
  "demo.launch(inline=False, share=False)"
1051
+ ]
 
 
 
1052
  },
1053
  {
1054
  "cell_type": "markdown",
1055
+ "id": "f2ea03da",
1056
  "metadata": {},
1057
  "source": [
1058
  "### What to notice\n",
 
1061
  "- **The reply wipes the last status.** The moment polish_response tokens arrive, we stop yielding status markers and start yielding the accumulated reply buffer. That buffer replaces whatever status was showing, so the final UI state is just the clean reply.\n",
1062
  "- **Session-scoped memory.** `request.session_hash` drives the `thread_id`, so each browser session gets its own thread in the `MemorySaver` checkpointer. Open a second tab and you start a fresh conversation — no leakage.\n",
1063
  "- **History accumulates via reducer.** `TicketStateV2.conversation` uses `Annotated[list, operator.add]`. `polish_and_log` appends `{user, assistant}` each turn; `generate_grounded_response` reads the list on the next turn so the prompt sees the full thread."
1064
+ ]
 
1065
  },
1066
  {
1067
  "cell_type": "markdown",
1068
+ "id": "13f3f73f",
1069
  "metadata": {},
1070
  "source": [
1071
  "---\n",
 
1090
  "> → write requirements.txt + packages.txt → gradio deploy\n",
1091
  "> → set Space secrets → test the live URL\n",
1092
  "> ```"
1093
+ ]
 
1094
  },
1095
  {
1096
  "cell_type": "markdown",
1097
+ "id": "0e74c0d3",
1098
  "metadata": {},
1099
  "source": [
1100
  "### 4.1 — Create a Hugging Face account\n",
 
1116
  " - **Type / Role**: choose **\"Write\"** — you need write access to create and update Spaces. A \"Read\" token is not enough.\n",
1117
  "4. Click **Create Token**. Copy the token (it starts with `hf_...`) **right now** — HF only shows it once. If you close the page without copying, you'll have to create a new one.\n",
1118
  "5. Paste it somewhere safe temporarily (a password manager is ideal). **Do not commit it to git or paste it into a notebook cell** — tokens in notebooks tend to leak into screenshots and commit history."
1119
+ ]
 
1120
  },
1121
  {
1122
  "cell_type": "markdown",
1123
+ "id": "7bd9a531",
1124
  "metadata": {},
1125
  "source": [
1126
  "### 4.3 — Install the CLI tools and log in\n",
 
1149
  "Success is confirmed when the CLI reports that your token was saved (typically under `~/.cache/huggingface/token`).\n",
1150
  "\n",
1151
  "> If you ever need to revoke access (e.g. laptop stolen), go back to the tokens page on the HF website and delete that named token — it invalidates immediately across all your machines."
1152
+ ]
 
1153
  },
1154
  {
1155
  "cell_type": "markdown",
1156
+ "id": "63f79ba3",
1157
  "metadata": {},
1158
  "source": [
1159
  "### 4.4 — Prepare the deployable app\n",
 
1170
  "- `demo.launch()` takes no arguments — HF injects `GRADIO_SERVER_NAME=0.0.0.0` and the right port via env vars, so default launch works.\n",
1171
  "\n",
1172
  "The cell below is just a sanity check — it confirms `app.py` is present in your working directory before you try to deploy."
1173
+ ]
 
1174
  },
1175
  {
1176
  "cell_type": "code",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1177
  "execution_count": 27,
1178
+ "id": "9a69bead",
1179
+ "metadata": {},
1180
  "outputs": [
1181
  {
1182
+ "name": "stdout",
1183
  "output_type": "stream",
1184
  "text": [
1185
  "app.py found: 15,310 bytes, 406 lines\n",
 
1204
  ]
1205
  }
1206
  ],
1207
+ "source": [
1208
+ "from pathlib import Path\n",
1209
+ "\n",
1210
+ "app_file = Path(\"app.py\")\n",
1211
+ "assert app_file.exists(), (\n",
1212
+ " \"app.py not found in the current directory. It ships alongside this notebook — \"\n",
1213
+ " \"make sure you're running from the repo root.\"\n",
1214
+ ")\n",
1215
+ "size = app_file.stat().st_size\n",
1216
+ "lines = app_file.read_text().count(\"\\n\") + 1\n",
1217
+ "print(f\"app.py found: {size:,} bytes, {lines} lines\")\n",
1218
+ "print(\"\\nFirst 15 lines:\\n\")\n",
1219
+ "print(\"\\n\".join(app_file.read_text().splitlines()[:15]))"
1220
+ ]
1221
  },
1222
  {
1223
  "cell_type": "markdown",
1224
+ "id": "87ad3526",
1225
  "metadata": {},
1226
  "source": [
1227
  "### 4.5 — Declare Python dependencies\n",
1228
  "\n",
1229
  "HF Spaces uses `requirements.txt` (one package per line) to rebuild your environment. The `pyproject.toml` for this repo carries packages we used across *all three* notebooks — we only need the subset that `app.py` actually imports.\n",
1230
  "\n",
1231
+ "> Note: when deploying with modern Gradio (`gradio[mcp,oauth]`), keep `mcp` at `>=1.21.0,<2.0.0` to avoid resolver conflicts.\n",
1232
+ "\n",
1233
  "Run the cell below — it writes a minimal `requirements.txt` next to `app.py`."
1234
+ ]
 
1235
  },
1236
  {
1237
  "cell_type": "code",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1238
  "execution_count": 28,
1239
+ "id": "729400ed",
1240
+ "metadata": {},
1241
  "outputs": [
1242
  {
1243
+ "name": "stdout",
1244
  "output_type": "stream",
1245
  "text": [
1246
  "Wrote requirements.txt:\n",
 
1257
  ]
1258
  }
1259
  ],
1260
+ "source": [
1261
+ "requirements_txt = \"\"\"gradio>=5.0.0\n",
1262
+ "langchain==1.2.11\n",
1263
+ "langchain-openai==1.1.12\n",
1264
+ "langchain-community>=0.4.1\n",
1265
+ "langchain-mcp-adapters==0.1.14\n",
1266
+ "langgraph>=1.1.2\n",
1267
+ "mcp>=1.21.0,<2.0.0\n",
1268
+ "pydantic==2.12.5\n",
1269
+ "python-dotenv>=1.2.1\n",
1270
+ "\"\"\"\n",
1271
+ "\n",
1272
+ "with open(\"requirements.txt\", \"w\") as f:\n",
1273
+ " f.write(requirements_txt)\n",
1274
+ "\n",
1275
+ "print(\"Wrote requirements.txt:\")\n",
1276
+ "print(requirements_txt)"
1277
+ ]
1278
  },
1279
  {
1280
  "cell_type": "markdown",
1281
+ "id": "52da9fe7",
1282
  "metadata": {},
1283
  "source": [
1284
  "### 4.6 — Declare system packages (Node.js for MCP)\n",
 
1286
  "Our app uses the **Filesystem MCP server**, which runs under Node via `npx`. The default Python Space image doesn't have Node. HF lets you install **system packages** (anything available via `apt-get`) by listing them in a `packages.txt` file at the project root.\n",
1287
  "\n",
1288
  "The cell below writes `packages.txt` with the two entries we need. Run it, then verify the file is in your project root."
1289
+ ]
 
1290
  },
1291
  {
1292
  "cell_type": "code",
 
 
 
 
 
 
 
 
 
 
 
 
1293
  "execution_count": 29,
1294
+ "id": "d0d3edab",
1295
+ "metadata": {},
1296
  "outputs": [
1297
  {
1298
+ "name": "stdout",
1299
  "output_type": "stream",
1300
  "text": [
1301
  "Wrote packages.txt:\n",
 
1305
  ]
1306
  }
1307
  ],
1308
+ "source": [
1309
+ "packages_txt = \"\"\"nodejs\n",
1310
+ "npm\n",
1311
+ "\"\"\"\n",
1312
+ "\n",
1313
+ "with open(\"packages.txt\", \"w\") as f:\n",
1314
+ " f.write(packages_txt)\n",
1315
+ "\n",
1316
+ "print(\"Wrote packages.txt:\")\n",
1317
+ "print(packages_txt)"
1318
+ ]
1319
  },
1320
  {
1321
  "cell_type": "markdown",
1322
+ "id": "14a3f503",
1323
  "metadata": {},
1324
  "source": [
1325
  "### 4.7 — Keep secrets out of the repo\n",
 
1340
  "The `gradio deploy` CLI respects `.gitignore`, so excluded files will not be uploaded. Double-check by running `git status --ignored` — your `.env` should appear under \"Ignored files\".\n",
1341
  "\n",
1342
  "> **Rule of thumb**: if the value would embarrass you on a billboard, it's a secret. API keys, endpoints, connection strings, tenant IDs — all secrets. These go into **Space secrets** (step 4.9), never into the repo."
1343
+ ]
 
1344
  },
1345
  {
1346
  "cell_type": "markdown",
1347
+ "id": "5c5b3dd5",
1348
  "metadata": {},
1349
  "source": [
1350
  "### 4.8 — Run `gradio deploy`\n",
 
1377
  "> The first build takes **3–10 minutes** — HF has to install Python packages *and* Node. Watch progress in the **Logs → Build** tab on the Space page. You'll see pip installs, then apt installs for `nodejs`, then your app starting up.\n",
1378
  "\n",
1379
  "If anything looks wrong mid-build, it's fine to let it finish and iterate — pushes are fast after the first time because layers are cached."
1380
+ ]
 
1381
  },
1382
  {
1383
  "cell_type": "markdown",
1384
+ "id": "97cb1558",
1385
  "metadata": {},
1386
  "source": [
1387
  "### 4.9 — Configure Space secrets\n",
 
1400
  "After saving all four, click **Restart this Space** (or **Factory reboot** for a full rebuild). The Space re-reads env vars on every restart; without the restart, your app is still running with `None` values and Azure calls will fail with authentication errors.\n",
1401
  "\n",
1402
  "> **Good hygiene**: each deployment env gets its own Azure deployment / key. Don't reuse your laptop `.env` values for production — rotate to a fresh set."
1403
+ ]
 
1404
  },
1405
  {
1406
  "cell_type": "markdown",
1407
+ "id": "c560a9d9",
1408
  "metadata": {},
1409
  "source": [
1410
  "### 4.10 — Verify and iterate\n",
 
1429
  "HF auto-rebuilds the Space on every push — you'll see the build progress under the **Logs → Build** tab.\n",
1430
  "\n",
1431
  "You do **not** need to re-run `uv run gradio deploy` on subsequent updates. It's only for the initial create. Editing files in the HF web UI also works and triggers a rebuild."
1432
+ ]
 
1433
  },
1434
  {
1435
  "cell_type": "markdown",
1436
+ "id": "473c6e08",
1437
  "metadata": {},
1438
  "source": [
1439
  "### 4.11 — Troubleshooting\n",
 
1451
  "| Want to pause the Space to stop accruing quota | Space is using CPU time you don't need right now | Settings → **Pause Space**. It won't serve traffic but your code + secrets are preserved. |\n",
1452
  "\n",
1453
  "**If you get truly stuck**, delete the Space (Settings → Delete this Space) and run `uv run gradio deploy` again from a clean directory. Spaces are cheap — starting over is often faster than debugging a bad state."
1454
+ ]
 
1455
  },
1456
  {
1457
  "cell_type": "markdown",
1458
+ "id": "88d623f2",
1459
  "metadata": {},
1460
  "source": [
1461
  "---\n",
 
1478
  "- **Observability** — pipe the same stream into Langfuse (Part 2's monitoring section) for latency + token breakdowns\n",
1479
  "\n",
1480
  "With these you've closed the loop: a LangGraph workflow → a KB-grounded tool-using agent → a live web UI, all streaming."
1481
+ ]
 
1482
  }
1483
  ],
1484
  "metadata": {
 
1502
  },
1503
  "nbformat": 4,
1504
  "nbformat_minor": 5
1505
+ }
pyproject.toml CHANGED
@@ -15,7 +15,7 @@ dependencies = [
15
  "langchain-community>=0.4.1",
16
  "PyMuPDF==1.26.3",
17
  "langchain-mcp-adapters==0.1.14",
18
- "mcp==1.15.0",
19
  "openevals",
20
  "yfinance>=0.2.54,<0.3",
21
  "gradio>=5.0.0",
 
15
  "langchain-community>=0.4.1",
16
  "PyMuPDF==1.26.3",
17
  "langchain-mcp-adapters==0.1.14",
18
+ "mcp>=1.21.0,<2.0.0",
19
  "openevals",
20
  "yfinance>=0.2.54,<0.3",
21
  "gradio>=5.0.0",
requirements.txt CHANGED
@@ -4,6 +4,6 @@ langchain-openai==1.1.12
4
  langchain-community>=0.4.1
5
  langchain-mcp-adapters==0.1.14
6
  langgraph>=1.1.2
7
- mcp==1.15.0
8
  pydantic==2.12.5
9
  python-dotenv>=1.2.1
 
4
  langchain-community>=0.4.1
5
  langchain-mcp-adapters==0.1.14
6
  langgraph>=1.1.2
7
+ mcp>=1.21.0,<2.0.0
8
  pydantic==2.12.5
9
  python-dotenv>=1.2.1
uv.lock CHANGED
@@ -463,6 +463,59 @@ wheels = [
463
  { url = "https://files.pythonhosted.org/packages/60/97/891a0971e1e4a8c5d2b20bbe0e524dc04548d2307fee33cdeba148fd4fc7/comm-0.2.3-py3-none-any.whl", hash = "sha256:c615d91d75f7f04f095b30d1c1711babd43bdc6419c1be9886a85f2f4e489417", size = 7294, upload-time = "2025-07-25T14:02:02.896Z" },
464
  ]
465
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
466
  [[package]]
467
  name = "curl-cffi"
468
  version = "0.15.0"
@@ -1216,7 +1269,7 @@ requires-dist = [
1216
  { name = "langchain-openai", specifier = "==1.1.12" },
1217
  { name = "langfuse", specifier = "==4.0.4" },
1218
  { name = "langgraph", specifier = ">=1.1.2" },
1219
- { name = "mcp", specifier = "==1.15.0" },
1220
  { name = "openevals" },
1221
  { name = "pydantic", specifier = "==2.12.5" },
1222
  { name = "pymupdf", specifier = "==1.26.3" },
@@ -1537,7 +1590,7 @@ wheels = [
1537
 
1538
  [[package]]
1539
  name = "mcp"
1540
- version = "1.15.0"
1541
  source = { registry = "https://pypi.org/simple" }
1542
  dependencies = [
1543
  { name = "anyio" },
@@ -1546,15 +1599,18 @@ dependencies = [
1546
  { name = "jsonschema" },
1547
  { name = "pydantic" },
1548
  { name = "pydantic-settings" },
 
1549
  { name = "python-multipart" },
1550
  { name = "pywin32", marker = "sys_platform == 'win32'" },
1551
  { name = "sse-starlette" },
1552
  { name = "starlette" },
 
 
1553
  { name = "uvicorn", marker = "sys_platform != 'emscripten'" },
1554
  ]
1555
- sdist = { url = "https://files.pythonhosted.org/packages/0c/9e/e65114795f359f314d7061f4fcb50dfe60026b01b52ad0b986b4631bf8bb/mcp-1.15.0.tar.gz", hash = "sha256:5bda1f4d383cf539d3c035b3505a3de94b20dbd7e4e8b4bd071e14634eeb2d72", size = 469622, upload-time = "2025-09-25T15:39:51.995Z" }
1556
  wheels = [
1557
- { url = "https://files.pythonhosted.org/packages/c9/82/4d0df23d5ff5bb982a59ad597bc7cb9920f2650278ccefb8e0d85c5ce3d4/mcp-1.15.0-py3-none-any.whl", hash = "sha256:314614c8addc67b663d6c3e4054db0a5c3dedc416c24ef8ce954e203fdc2333d", size = 166963, upload-time = "2025-09-25T15:39:50.538Z" },
1558
  ]
1559
 
1560
  [[package]]
@@ -2411,6 +2467,20 @@ wheels = [
2411
  { url = "https://files.pythonhosted.org/packages/f4/7e/a72dd26f3b0f4f2bf1dd8923c85f7ceb43172af56d63c7383eb62b332364/pygments-2.20.0-py3-none-any.whl", hash = "sha256:81a9e26dd42fd28a23a2d169d86d7ac03b46e2f8b59ed4698fb4785f946d0176", size = 1231151, upload-time = "2026-03-29T13:29:30.038Z" },
2412
  ]
2413
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2414
  [[package]]
2415
  name = "pymupdf"
2416
  version = "1.26.3"
 
463
  { url = "https://files.pythonhosted.org/packages/60/97/891a0971e1e4a8c5d2b20bbe0e524dc04548d2307fee33cdeba148fd4fc7/comm-0.2.3-py3-none-any.whl", hash = "sha256:c615d91d75f7f04f095b30d1c1711babd43bdc6419c1be9886a85f2f4e489417", size = 7294, upload-time = "2025-07-25T14:02:02.896Z" },
464
  ]
465
 
466
+ [[package]]
467
+ name = "cryptography"
468
+ version = "46.0.7"
469
+ source = { registry = "https://pypi.org/simple" }
470
+ dependencies = [
471
+ { name = "cffi", marker = "platform_python_implementation != 'PyPy'" },
472
+ ]
473
+ sdist = { url = "https://files.pythonhosted.org/packages/47/93/ac8f3d5ff04d54bc814e961a43ae5b0b146154c89c61b47bb07557679b18/cryptography-46.0.7.tar.gz", hash = "sha256:e4cfd68c5f3e0bfdad0d38e023239b96a2fe84146481852dffbcca442c245aa5", size = 750652, upload-time = "2026-04-08T01:57:54.692Z" }
474
+ wheels = [
475
+ { url = "https://files.pythonhosted.org/packages/0b/5d/4a8f770695d73be252331e60e526291e3df0c9b27556a90a6b47bccca4c2/cryptography-46.0.7-cp311-abi3-macosx_10_9_universal2.whl", hash = "sha256:ea42cbe97209df307fdc3b155f1b6fa2577c0defa8f1f7d3be7d31d189108ad4", size = 7179869, upload-time = "2026-04-08T01:56:17.157Z" },
476
+ { url = "https://files.pythonhosted.org/packages/5f/45/6d80dc379b0bbc1f9d1e429f42e4cb9e1d319c7a8201beffd967c516ea01/cryptography-46.0.7-cp311-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:b36a4695e29fe69215d75960b22577197aca3f7a25b9cf9d165dcfe9d80bc325", size = 4275492, upload-time = "2026-04-08T01:56:19.36Z" },
477
+ { url = "https://files.pythonhosted.org/packages/4a/9a/1765afe9f572e239c3469f2cb429f3ba7b31878c893b246b4b2994ffe2fe/cryptography-46.0.7-cp311-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:5ad9ef796328c5e3c4ceed237a183f5d41d21150f972455a9d926593a1dcb308", size = 4426670, upload-time = "2026-04-08T01:56:21.415Z" },
478
+ { url = "https://files.pythonhosted.org/packages/8f/3e/af9246aaf23cd4ee060699adab1e47ced3f5f7e7a8ffdd339f817b446462/cryptography-46.0.7-cp311-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:73510b83623e080a2c35c62c15298096e2a5dc8d51c3b4e1740211839d0dea77", size = 4280275, upload-time = "2026-04-08T01:56:23.539Z" },
479
+ { url = "https://files.pythonhosted.org/packages/0f/54/6bbbfc5efe86f9d71041827b793c24811a017c6ac0fd12883e4caa86b8ed/cryptography-46.0.7-cp311-abi3-manylinux_2_28_ppc64le.whl", hash = "sha256:cbd5fb06b62bd0721e1170273d3f4d5a277044c47ca27ee257025146c34cbdd1", size = 4928402, upload-time = "2026-04-08T01:56:25.624Z" },
480
+ { url = "https://files.pythonhosted.org/packages/2d/cf/054b9d8220f81509939599c8bdbc0c408dbd2bdd41688616a20731371fe0/cryptography-46.0.7-cp311-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:420b1e4109cc95f0e5700eed79908cef9268265c773d3a66f7af1eef53d409ef", size = 4459985, upload-time = "2026-04-08T01:56:27.309Z" },
481
+ { url = "https://files.pythonhosted.org/packages/f9/46/4e4e9c6040fb01c7467d47217d2f882daddeb8828f7df800cb806d8a2288/cryptography-46.0.7-cp311-abi3-manylinux_2_31_armv7l.whl", hash = "sha256:24402210aa54baae71d99441d15bb5a1919c195398a87b563df84468160a65de", size = 3990652, upload-time = "2026-04-08T01:56:29.095Z" },
482
+ { url = "https://files.pythonhosted.org/packages/36/5f/313586c3be5a2fbe87e4c9a254207b860155a8e1f3cca99f9910008e7d08/cryptography-46.0.7-cp311-abi3-manylinux_2_34_aarch64.whl", hash = "sha256:8a469028a86f12eb7d2fe97162d0634026d92a21f3ae0ac87ed1c4a447886c83", size = 4279805, upload-time = "2026-04-08T01:56:30.928Z" },
483
+ { url = "https://files.pythonhosted.org/packages/69/33/60dfc4595f334a2082749673386a4d05e4f0cf4df8248e63b2c3437585f2/cryptography-46.0.7-cp311-abi3-manylinux_2_34_ppc64le.whl", hash = "sha256:9694078c5d44c157ef3162e3bf3946510b857df5a3955458381d1c7cfc143ddb", size = 4892883, upload-time = "2026-04-08T01:56:32.614Z" },
484
+ { url = "https://files.pythonhosted.org/packages/c7/0b/333ddab4270c4f5b972f980adef4faa66951a4aaf646ca067af597f15563/cryptography-46.0.7-cp311-abi3-manylinux_2_34_x86_64.whl", hash = "sha256:42a1e5f98abb6391717978baf9f90dc28a743b7d9be7f0751a6f56a75d14065b", size = 4459756, upload-time = "2026-04-08T01:56:34.306Z" },
485
+ { url = "https://files.pythonhosted.org/packages/d2/14/633913398b43b75f1234834170947957c6b623d1701ffc7a9600da907e89/cryptography-46.0.7-cp311-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:91bbcb08347344f810cbe49065914fe048949648f6bd5c2519f34619142bbe85", size = 4410244, upload-time = "2026-04-08T01:56:35.977Z" },
486
+ { url = "https://files.pythonhosted.org/packages/10/f2/19ceb3b3dc14009373432af0c13f46aa08e3ce334ec6eff13492e1812ccd/cryptography-46.0.7-cp311-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:5d1c02a14ceb9148cc7816249f64f623fbfee39e8c03b3650d842ad3f34d637e", size = 4674868, upload-time = "2026-04-08T01:56:38.034Z" },
487
+ { url = "https://files.pythonhosted.org/packages/1a/bb/a5c213c19ee94b15dfccc48f363738633a493812687f5567addbcbba9f6f/cryptography-46.0.7-cp311-abi3-win32.whl", hash = "sha256:d23c8ca48e44ee015cd0a54aeccdf9f09004eba9fc96f38c911011d9ff1bd457", size = 3026504, upload-time = "2026-04-08T01:56:39.666Z" },
488
+ { url = "https://files.pythonhosted.org/packages/2b/02/7788f9fefa1d060ca68717c3901ae7fffa21ee087a90b7f23c7a603c32ae/cryptography-46.0.7-cp311-abi3-win_amd64.whl", hash = "sha256:397655da831414d165029da9bc483bed2fe0e75dde6a1523ec2fe63f3c46046b", size = 3488363, upload-time = "2026-04-08T01:56:41.893Z" },
489
+ { url = "https://files.pythonhosted.org/packages/7b/56/15619b210e689c5403bb0540e4cb7dbf11a6bf42e483b7644e471a2812b3/cryptography-46.0.7-cp314-cp314t-macosx_10_9_universal2.whl", hash = "sha256:d151173275e1728cf7839aaa80c34fe550c04ddb27b34f48c232193df8db5842", size = 7119671, upload-time = "2026-04-08T01:56:44Z" },
490
+ { url = "https://files.pythonhosted.org/packages/74/66/e3ce040721b0b5599e175ba91ab08884c75928fbeb74597dd10ef13505d2/cryptography-46.0.7-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:db0f493b9181c7820c8134437eb8b0b4792085d37dbb24da050476ccb664e59c", size = 4268551, upload-time = "2026-04-08T01:56:46.071Z" },
491
+ { url = "https://files.pythonhosted.org/packages/03/11/5e395f961d6868269835dee1bafec6a1ac176505a167f68b7d8818431068/cryptography-46.0.7-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:ebd6daf519b9f189f85c479427bbd6e9c9037862cf8fe89ee35503bd209ed902", size = 4408887, upload-time = "2026-04-08T01:56:47.718Z" },
492
+ { url = "https://files.pythonhosted.org/packages/40/53/8ed1cf4c3b9c8e611e7122fb56f1c32d09e1fff0f1d77e78d9ff7c82653e/cryptography-46.0.7-cp314-cp314t-manylinux_2_28_aarch64.whl", hash = "sha256:b7b412817be92117ec5ed95f880defe9cf18a832e8cafacf0a22337dc1981b4d", size = 4271354, upload-time = "2026-04-08T01:56:49.312Z" },
493
+ { url = "https://files.pythonhosted.org/packages/50/46/cf71e26025c2e767c5609162c866a78e8a2915bbcfa408b7ca495c6140c4/cryptography-46.0.7-cp314-cp314t-manylinux_2_28_ppc64le.whl", hash = "sha256:fbfd0e5f273877695cb93baf14b185f4878128b250cc9f8e617ea0c025dfb022", size = 4905845, upload-time = "2026-04-08T01:56:50.916Z" },
494
+ { url = "https://files.pythonhosted.org/packages/c0/ea/01276740375bac6249d0a971ebdf6b4dc9ead0ee0a34ef3b5a88c1a9b0d4/cryptography-46.0.7-cp314-cp314t-manylinux_2_28_x86_64.whl", hash = "sha256:ffca7aa1d00cf7d6469b988c581598f2259e46215e0140af408966a24cf086ce", size = 4444641, upload-time = "2026-04-08T01:56:52.882Z" },
495
+ { url = "https://files.pythonhosted.org/packages/3d/4c/7d258f169ae71230f25d9f3d06caabcff8c3baf0978e2b7d65e0acac3827/cryptography-46.0.7-cp314-cp314t-manylinux_2_31_armv7l.whl", hash = "sha256:60627cf07e0d9274338521205899337c5d18249db56865f943cbe753aa96f40f", size = 3967749, upload-time = "2026-04-08T01:56:54.597Z" },
496
+ { url = "https://files.pythonhosted.org/packages/b5/2a/2ea0767cad19e71b3530e4cad9605d0b5e338b6a1e72c37c9c1ceb86c333/cryptography-46.0.7-cp314-cp314t-manylinux_2_34_aarch64.whl", hash = "sha256:80406c3065e2c55d7f49a9550fe0c49b3f12e5bfff5dedb727e319e1afb9bf99", size = 4270942, upload-time = "2026-04-08T01:56:56.416Z" },
497
+ { url = "https://files.pythonhosted.org/packages/41/3d/fe14df95a83319af25717677e956567a105bb6ab25641acaa093db79975d/cryptography-46.0.7-cp314-cp314t-manylinux_2_34_ppc64le.whl", hash = "sha256:c5b1ccd1239f48b7151a65bc6dd54bcfcc15e028c8ac126d3fada09db0e07ef1", size = 4871079, upload-time = "2026-04-08T01:56:58.31Z" },
498
+ { url = "https://files.pythonhosted.org/packages/9c/59/4a479e0f36f8f378d397f4eab4c850b4ffb79a2f0d58704b8fa0703ddc11/cryptography-46.0.7-cp314-cp314t-manylinux_2_34_x86_64.whl", hash = "sha256:d5f7520159cd9c2154eb61eb67548ca05c5774d39e9c2c4339fd793fe7d097b2", size = 4443999, upload-time = "2026-04-08T01:57:00.508Z" },
499
+ { url = "https://files.pythonhosted.org/packages/28/17/b59a741645822ec6d04732b43c5d35e4ef58be7bfa84a81e5ae6f05a1d33/cryptography-46.0.7-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:fcd8eac50d9138c1d7fc53a653ba60a2bee81a505f9f8850b6b2888555a45d0e", size = 4399191, upload-time = "2026-04-08T01:57:02.654Z" },
500
+ { url = "https://files.pythonhosted.org/packages/59/6a/bb2e166d6d0e0955f1e9ff70f10ec4b2824c9cfcdb4da772c7dd69cc7d80/cryptography-46.0.7-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:65814c60f8cc400c63131584e3e1fad01235edba2614b61fbfbfa954082db0ee", size = 4655782, upload-time = "2026-04-08T01:57:04.592Z" },
501
+ { url = "https://files.pythonhosted.org/packages/95/b6/3da51d48415bcb63b00dc17c2eff3a651b7c4fed484308d0f19b30e8cb2c/cryptography-46.0.7-cp314-cp314t-win32.whl", hash = "sha256:fdd1736fed309b4300346f88f74cd120c27c56852c3838cab416e7a166f67298", size = 3002227, upload-time = "2026-04-08T01:57:06.91Z" },
502
+ { url = "https://files.pythonhosted.org/packages/32/a8/9f0e4ed57ec9cebe506e58db11ae472972ecb0c659e4d52bbaee80ca340a/cryptography-46.0.7-cp314-cp314t-win_amd64.whl", hash = "sha256:e06acf3c99be55aa3b516397fe42f5855597f430add9c17fa46bf2e0fb34c9bb", size = 3475332, upload-time = "2026-04-08T01:57:08.807Z" },
503
+ { url = "https://files.pythonhosted.org/packages/a7/7f/cd42fc3614386bc0c12f0cb3c4ae1fc2bbca5c9662dfed031514911d513d/cryptography-46.0.7-cp38-abi3-macosx_10_9_universal2.whl", hash = "sha256:462ad5cb1c148a22b2e3bcc5ad52504dff325d17daf5df8d88c17dda1f75f2a4", size = 7165618, upload-time = "2026-04-08T01:57:10.645Z" },
504
+ { url = "https://files.pythonhosted.org/packages/a5/d0/36a49f0262d2319139d2829f773f1b97ef8aef7f97e6e5bd21455e5a8fb5/cryptography-46.0.7-cp38-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:84d4cced91f0f159a7ddacad249cc077e63195c36aac40b4150e7a57e84fffe7", size = 4270628, upload-time = "2026-04-08T01:57:12.885Z" },
505
+ { url = "https://files.pythonhosted.org/packages/8a/6c/1a42450f464dda6ffbe578a911f773e54dd48c10f9895a23a7e88b3e7db5/cryptography-46.0.7-cp38-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:128c5edfe5e5938b86b03941e94fac9ee793a94452ad1365c9fc3f4f62216832", size = 4415405, upload-time = "2026-04-08T01:57:14.923Z" },
506
+ { url = "https://files.pythonhosted.org/packages/9a/92/4ed714dbe93a066dc1f4b4581a464d2d7dbec9046f7c8b7016f5286329e2/cryptography-46.0.7-cp38-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:5e51be372b26ef4ba3de3c167cd3d1022934bc838ae9eaad7e644986d2a3d163", size = 4272715, upload-time = "2026-04-08T01:57:16.638Z" },
507
+ { url = "https://files.pythonhosted.org/packages/b7/e6/a26b84096eddd51494bba19111f8fffe976f6a09f132706f8f1bf03f51f7/cryptography-46.0.7-cp38-abi3-manylinux_2_28_ppc64le.whl", hash = "sha256:cdf1a610ef82abb396451862739e3fc93b071c844399e15b90726ef7470eeaf2", size = 4918400, upload-time = "2026-04-08T01:57:19.021Z" },
508
+ { url = "https://files.pythonhosted.org/packages/c7/08/ffd537b605568a148543ac3c2b239708ae0bd635064bab41359252ef88ed/cryptography-46.0.7-cp38-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:1d25aee46d0c6f1a501adcddb2d2fee4b979381346a78558ed13e50aa8a59067", size = 4450634, upload-time = "2026-04-08T01:57:21.185Z" },
509
+ { url = "https://files.pythonhosted.org/packages/16/01/0cd51dd86ab5b9befe0d031e276510491976c3a80e9f6e31810cce46c4ad/cryptography-46.0.7-cp38-abi3-manylinux_2_31_armv7l.whl", hash = "sha256:cdfbe22376065ffcf8be74dc9a909f032df19bc58a699456a21712d6e5eabfd0", size = 3985233, upload-time = "2026-04-08T01:57:22.862Z" },
510
+ { url = "https://files.pythonhosted.org/packages/92/49/819d6ed3a7d9349c2939f81b500a738cb733ab62fbecdbc1e38e83d45e12/cryptography-46.0.7-cp38-abi3-manylinux_2_34_aarch64.whl", hash = "sha256:abad9dac36cbf55de6eb49badd4016806b3165d396f64925bf2999bcb67837ba", size = 4271955, upload-time = "2026-04-08T01:57:24.814Z" },
511
+ { url = "https://files.pythonhosted.org/packages/80/07/ad9b3c56ebb95ed2473d46df0847357e01583f4c52a85754d1a55e29e4d0/cryptography-46.0.7-cp38-abi3-manylinux_2_34_ppc64le.whl", hash = "sha256:935ce7e3cfdb53e3536119a542b839bb94ec1ad081013e9ab9b7cfd478b05006", size = 4879888, upload-time = "2026-04-08T01:57:26.88Z" },
512
+ { url = "https://files.pythonhosted.org/packages/b8/c7/201d3d58f30c4c2bdbe9b03844c291feb77c20511cc3586daf7edc12a47b/cryptography-46.0.7-cp38-abi3-manylinux_2_34_x86_64.whl", hash = "sha256:35719dc79d4730d30f1c2b6474bd6acda36ae2dfae1e3c16f2051f215df33ce0", size = 4449961, upload-time = "2026-04-08T01:57:29.068Z" },
513
+ { url = "https://files.pythonhosted.org/packages/a5/ef/649750cbf96f3033c3c976e112265c33906f8e462291a33d77f90356548c/cryptography-46.0.7-cp38-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:7bbc6ccf49d05ac8f7d7b5e2e2c33830d4fe2061def88210a126d130d7f71a85", size = 4401696, upload-time = "2026-04-08T01:57:31.029Z" },
514
+ { url = "https://files.pythonhosted.org/packages/41/52/a8908dcb1a389a459a29008c29966c1d552588d4ae6d43f3a1a4512e0ebe/cryptography-46.0.7-cp38-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:a1529d614f44b863a7b480c6d000fe93b59acee9c82ffa027cfadc77521a9f5e", size = 4664256, upload-time = "2026-04-08T01:57:33.144Z" },
515
+ { url = "https://files.pythonhosted.org/packages/4b/fa/f0ab06238e899cc3fb332623f337a7364f36f4bb3f2534c2bb95a35b132c/cryptography-46.0.7-cp38-abi3-win32.whl", hash = "sha256:f247c8c1a1fb45e12586afbb436ef21ff1e80670b2861a90353d9b025583d246", size = 3013001, upload-time = "2026-04-08T01:57:34.933Z" },
516
+ { url = "https://files.pythonhosted.org/packages/d2/f1/00ce3bde3ca542d1acd8f8cfa38e446840945aa6363f9b74746394b14127/cryptography-46.0.7-cp38-abi3-win_amd64.whl", hash = "sha256:506c4ff91eff4f82bdac7633318a526b1d1309fc07ca76a3ad182cb5b686d6d3", size = 3472985, upload-time = "2026-04-08T01:57:36.714Z" },
517
+ ]
518
+
519
  [[package]]
520
  name = "curl-cffi"
521
  version = "0.15.0"
 
1269
  { name = "langchain-openai", specifier = "==1.1.12" },
1270
  { name = "langfuse", specifier = "==4.0.4" },
1271
  { name = "langgraph", specifier = ">=1.1.2" },
1272
+ { name = "mcp", specifier = ">=1.21.0,<2.0.0" },
1273
  { name = "openevals" },
1274
  { name = "pydantic", specifier = "==2.12.5" },
1275
  { name = "pymupdf", specifier = "==1.26.3" },
 
1590
 
1591
  [[package]]
1592
  name = "mcp"
1593
+ version = "1.27.0"
1594
  source = { registry = "https://pypi.org/simple" }
1595
  dependencies = [
1596
  { name = "anyio" },
 
1599
  { name = "jsonschema" },
1600
  { name = "pydantic" },
1601
  { name = "pydantic-settings" },
1602
+ { name = "pyjwt", extra = ["crypto"] },
1603
  { name = "python-multipart" },
1604
  { name = "pywin32", marker = "sys_platform == 'win32'" },
1605
  { name = "sse-starlette" },
1606
  { name = "starlette" },
1607
+ { name = "typing-extensions" },
1608
+ { name = "typing-inspection" },
1609
  { name = "uvicorn", marker = "sys_platform != 'emscripten'" },
1610
  ]
1611
+ sdist = { url = "https://files.pythonhosted.org/packages/8b/eb/c0cfc62075dc6e1ec1c64d352ae09ac051d9334311ed226f1f425312848a/mcp-1.27.0.tar.gz", hash = "sha256:d3dc35a7eec0d458c1da4976a48f982097ddaab87e278c5511d5a4a56e852b83", size = 607509, upload-time = "2026-04-02T14:48:08.88Z" }
1612
  wheels = [
1613
+ { url = "https://files.pythonhosted.org/packages/9c/46/f6b4ad632c67ef35209a66127e4bddc95759649dd595f71f13fba11bdf9a/mcp-1.27.0-py3-none-any.whl", hash = "sha256:5ce1fa81614958e267b21fb2aa34e0aea8e2c6ede60d52aba45fd47246b4d741", size = 215967, upload-time = "2026-04-02T14:48:07.24Z" },
1614
  ]
1615
 
1616
  [[package]]
 
2467
  { url = "https://files.pythonhosted.org/packages/f4/7e/a72dd26f3b0f4f2bf1dd8923c85f7ceb43172af56d63c7383eb62b332364/pygments-2.20.0-py3-none-any.whl", hash = "sha256:81a9e26dd42fd28a23a2d169d86d7ac03b46e2f8b59ed4698fb4785f946d0176", size = 1231151, upload-time = "2026-03-29T13:29:30.038Z" },
2468
  ]
2469
 
2470
+ [[package]]
2471
+ name = "pyjwt"
2472
+ version = "2.12.1"
2473
+ source = { registry = "https://pypi.org/simple" }
2474
+ sdist = { url = "https://files.pythonhosted.org/packages/c2/27/a3b6e5bf6ff856d2509292e95c8f57f0df7017cf5394921fc4e4ef40308a/pyjwt-2.12.1.tar.gz", hash = "sha256:c74a7a2adf861c04d002db713dd85f84beb242228e671280bf709d765b03672b", size = 102564, upload-time = "2026-03-13T19:27:37.25Z" }
2475
+ wheels = [
2476
+ { url = "https://files.pythonhosted.org/packages/e5/7a/8dd906bd22e79e47397a61742927f6747fe93242ef86645ee9092e610244/pyjwt-2.12.1-py3-none-any.whl", hash = "sha256:28ca37c070cad8ba8cd9790cd940535d40274d22f80ab87f3ac6a713e6e8454c", size = 29726, upload-time = "2026-03-13T19:27:35.677Z" },
2477
+ ]
2478
+
2479
+ [package.optional-dependencies]
2480
+ crypto = [
2481
+ { name = "cryptography" },
2482
+ ]
2483
+
2484
  [[package]]
2485
  name = "pymupdf"
2486
  version = "1.26.3"