mbudisic commited on
Commit
b5100ba
·
1 Parent(s): d89987a

Agent notebook start

Browse files
Files changed (1) hide show
  1. notebooks/transcript_agents.ipynb +328 -0
notebooks/transcript_agents.ipynb ADDED
@@ -0,0 +1,328 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "code",
5
+ "execution_count": 36,
6
+ "metadata": {},
7
+ "outputs": [],
8
+ "source": [
9
+ "import os\n",
10
+ "from getpass import getpass\n",
11
+ "\n",
12
+ "from dotenv import load_dotenv\n",
13
+ "\n"
14
+ ]
15
+ },
16
+ {
17
+ "cell_type": "code",
18
+ "execution_count": 37,
19
+ "metadata": {},
20
+ "outputs": [],
21
+ "source": [
22
+ "import pstuts_rag"
23
+ ]
24
+ },
25
+ {
26
+ "cell_type": "code",
27
+ "execution_count": 38,
28
+ "metadata": {},
29
+ "outputs": [
30
+ {
31
+ "name": "stdout",
32
+ "output_type": "stream",
33
+ "text": [
34
+ "The autoreload extension is already loaded. To reload it, use:\n",
35
+ " %reload_ext autoreload\n"
36
+ ]
37
+ }
38
+ ],
39
+ "source": [
40
+ "%load_ext autoreload\n",
41
+ "%autoreload 2\n"
42
+ ]
43
+ },
44
+ {
45
+ "cell_type": "code",
46
+ "execution_count": 39,
47
+ "metadata": {},
48
+ "outputs": [],
49
+ "source": [
50
+ "from dataclasses import dataclass\n",
51
+ "@dataclass\n",
52
+ "class ApplicationParameters:\n",
53
+ " filename = \"data/test.json\"\n",
54
+ " embedding_model = \"text-embedding-3-small\"\n",
55
+ " n_context_docs = 2\n",
56
+ "\n",
57
+ "params = ApplicationParameters()"
58
+ ]
59
+ },
60
+ {
61
+ "cell_type": "code",
62
+ "execution_count": 40,
63
+ "metadata": {},
64
+ "outputs": [],
65
+ "source": [
66
+ "\n",
67
+ "load_dotenv()\n",
68
+ "\n",
69
+ "def set_api_key_if_not_present(key_name, prompt_message=\"\"):\n",
70
+ " if len(prompt_message) == 0:\n",
71
+ " prompt_message=key_name\n",
72
+ " if key_name not in os.environ or not os.environ[key_name]:\n",
73
+ " os.environ[key_name] = getpass.getpass(prompt_message)\n",
74
+ "\n",
75
+ "set_api_key_if_not_present(\"OPENAI_API_KEY\")"
76
+ ]
77
+ },
78
+ {
79
+ "cell_type": "markdown",
80
+ "metadata": {},
81
+ "source": [
82
+ "# Data Preparation\n",
83
+ "\n",
84
+ "First, we will read in the transcripts of the videos and convert them to Documents\n",
85
+ "with appropriate metadata."
86
+ ]
87
+ },
88
+ {
89
+ "cell_type": "code",
90
+ "execution_count": 53,
91
+ "metadata": {},
92
+ "outputs": [],
93
+ "source": [
94
+ "from ast import Dict\n",
95
+ "import json\n",
96
+ "\n",
97
+ "from pstuts_rag.loader import load_json_files\n",
98
+ "filename = [\"../data/test.json\",\"../data/dev.json\"]\n",
99
+ "from typing import List, Dict, Any\n",
100
+ "data:List[Dict[str,Any]] = await load_json_files(filename)\n"
101
+ ]
102
+ },
103
+ {
104
+ "cell_type": "code",
105
+ "execution_count": 56,
106
+ "metadata": {},
107
+ "outputs": [
108
+ {
109
+ "data": {
110
+ "text/plain": [
111
+ "['Get organized with layer groups',\n",
112
+ " 'Remove unwanted objects from photos',\n",
113
+ " 'Include vector graphics',\n",
114
+ " 'Remove unwanted content',\n",
115
+ " 'Add a central element',\n",
116
+ " 'Set the resolution',\n",
117
+ " 'Understand layers',\n",
118
+ " 'Adjust brightness and contrast',\n",
119
+ " 'Remove a large object',\n",
120
+ " 'Add text',\n",
121
+ " 'Replace a background using a layer mask']"
122
+ ]
123
+ },
124
+ "execution_count": 56,
125
+ "metadata": {},
126
+ "output_type": "execute_result"
127
+ }
128
+ ],
129
+ "source": [
130
+ "[ d[\"title\"] for d in data ]"
131
+ ]
132
+ },
133
+ {
134
+ "cell_type": "markdown",
135
+ "metadata": {},
136
+ "source": [
137
+ "## R - retrieval"
138
+ ]
139
+ },
140
+ {
141
+ "cell_type": "markdown",
142
+ "metadata": {},
143
+ "source": [
144
+ "Let's hit it with a semantic chunker."
145
+ ]
146
+ },
147
+ {
148
+ "cell_type": "code",
149
+ "execution_count": 43,
150
+ "metadata": {},
151
+ "outputs": [],
152
+ "source": [
153
+ "from pstuts_rag.datastore import DatastoreManager\n",
154
+ "from qdrant_client import QdrantClient\n",
155
+ "\n",
156
+ "client = QdrantClient(\":memory:\")\n",
157
+ "\n",
158
+ "retriever_factory = DatastoreManager(qdrant_client=client,name=\"local_test\")\n",
159
+ "if retriever_factory.count_docs() == 0:\n",
160
+ " await retriever_factory.populate_database(raw_docs=data)"
161
+ ]
162
+ },
163
+ {
164
+ "cell_type": "markdown",
165
+ "metadata": {},
166
+ "source": [
167
+ "## A - Augmentation\n",
168
+ "\n",
169
+ "We need to populate a prompt for LLM.\n"
170
+ ]
171
+ },
172
+ {
173
+ "cell_type": "markdown",
174
+ "metadata": {},
175
+ "source": [
176
+ "## Generation\n",
177
+ "\n",
178
+ "We will use a 4.1-nano to generate answers."
179
+ ]
180
+ },
181
+ {
182
+ "cell_type": "code",
183
+ "execution_count": 44,
184
+ "metadata": {},
185
+ "outputs": [],
186
+ "source": [
187
+ "from pstuts_rag.rag import RAGChainFactory\n",
188
+ "\n",
189
+ "rag_factory = RAGChainFactory(retriever=retriever_factory.get_retriever())"
190
+ ]
191
+ },
192
+ {
193
+ "cell_type": "code",
194
+ "execution_count": 45,
195
+ "metadata": {},
196
+ "outputs": [],
197
+ "source": [
198
+ "from langchain_openai import ChatOpenAI\n",
199
+ "\n",
200
+ "llm = ChatOpenAI(model=\"gpt-4.1-mini\",temperature=0)"
201
+ ]
202
+ },
203
+ {
204
+ "cell_type": "code",
205
+ "execution_count": 46,
206
+ "metadata": {},
207
+ "outputs": [],
208
+ "source": [
209
+ "get_videos = rag_factory.get_rag_chain(llm)\n",
210
+ " \n"
211
+ ]
212
+ },
213
+ {
214
+ "cell_type": "code",
215
+ "execution_count": 47,
216
+ "metadata": {},
217
+ "outputs": [],
218
+ "source": [
219
+ "val = await get_videos.ainvoke({\"question\":\"What are layers\"})"
220
+ ]
221
+ },
222
+ {
223
+ "cell_type": "code",
224
+ "execution_count": 48,
225
+ "metadata": {},
226
+ "outputs": [
227
+ {
228
+ "data": {
229
+ "text/plain": [
230
+ "{'refusal': None,\n",
231
+ " 'context': [Document(metadata={'video_id': 19172, 'title': 'Understand layers', 'desc': 'Learn what layers are and why they are so useful.', 'length': '00:04:44.75', 'group': 'test.json', 'source': 'https://images-tv.adobe.com/avp/vr/b758b4c4-2a74-41f4-8e67-e2f2eab83c6a/f810fc5b-2b04-4e23-8fa4-5c532e7de6f8/e268fe4d-e5c7-415c-9f5c-d34d024b14d8_20170727011753.1280x720at2400_h264.mp4', 'speech_start_stop_times': [[0.47, 3.41], [3.81, 9.13], [9.309999, 15.01], [15.299999, 20.57], [20.88, 23.3], [23.83, 27.93], [29.38, 32.79], [32.96, 33.92], [34.43, 40.21], [41.91, 45.37], [45.88, 49.01], [49.54, 55.130001], [55.72, 58.49], [58.72, 62.14]], 'start': 0.47, 'stop': 62.14, '_id': 21, '_collection_name': 'local_test'}, page_content=\"Layers are the building blocks of any image in Photoshop CC. So, it's important to understand, what layers are and why to use them - which we'll cover in this video. If you're following along, open this layered image from the downloadable practice files for this tutorial. You might think of layers like separate flat pints of glass, stacked one on top of the other. Each layer contains separate pieces of content. To get a sense of how layers are constructed, let's take a look at this Layers panel. I've closed my other panels, so that we can focus on the Layers panel. But you can skip that. By the way: If your Layers panel isn't showing, go up to the Window menu and choose Layers from there. The Layers panel is where you go to select and work with layers. In this image there are 4 layers, each with separate content. If you click the Eye icon to the left of a layer, you can toggle the visibility of that layer off and on. So, I'm going to turn off the visibility of the tailor layer. And keep your eye on the image, so you can see what's on that layer.\"),\n",
232
+ " Document(metadata={'video_id': 19172, 'title': 'Understand layers', 'desc': 'Learn what layers are and why they are so useful.', 'length': '00:04:44.75', 'group': 'test.json', 'source': 'https://images-tv.adobe.com/avp/vr/b758b4c4-2a74-41f4-8e67-e2f2eab83c6a/f810fc5b-2b04-4e23-8fa4-5c532e7de6f8/e268fe4d-e5c7-415c-9f5c-d34d024b14d8_20170727011753.1280x720at2400_h264.mp4', 'speech_start_stop_times': [[85.75, 88.659999], [89.42, 100.11], [101.469999, 108.64], [109.09, 117.459999], [117.75, 129.45], [129.97, 133.37], [133.73, 143.98], [144.76, 152.97]], 'start': 85.75, 'stop': 152.97, '_id': 23, '_collection_name': 'local_test'}, page_content=\"Now let's take a look at just one layer, the tailor layer. A quick way to turn off all the layers except the tailor layer, is to hold down the Option key on the Mac, or the ALT key on the PC, and click on the Eye icon to the left of the tailor layer. In the Document window, you can see that this layer contains just the one small photo surrounded by a gray and white checkerboard pattern. That pattern represents transparent pixels, which allow us to see down through the corresponding part of this layer to the content of the layers below. So, let's turn that content back on by going back to the Layers panel, again holding the Option key on the Mac or the ALT key on the PC and clicking on the Eye icon to the left of the tailor layer. And all the other layers and their Eye icons come back into view. So again: You might think of layers like a stack of pints of glass, each with its own artwork and in some cases transparent areas that let you see down through to the layers below. The biggest benefit of having items on separate layers like this, is that you'll be able to edit pieces of an image independently without affecting the rest of the image.\")],\n",
233
+ " 'question': 'What are layers'}"
234
+ ]
235
+ },
236
+ "execution_count": 48,
237
+ "metadata": {},
238
+ "output_type": "execute_result"
239
+ }
240
+ ],
241
+ "source": [
242
+ "val.additional_kwargs"
243
+ ]
244
+ },
245
+ {
246
+ "cell_type": "code",
247
+ "execution_count": 49,
248
+ "metadata": {},
249
+ "outputs": [
250
+ {
251
+ "name": "stdout",
252
+ "output_type": "stream",
253
+ "text": [
254
+ "==================================\u001b[1m Ai Message \u001b[0m==================================\n",
255
+ "\n",
256
+ "Layers are the building blocks of any image in Photoshop CC. You can think of layers like separate flat panes of glass stacked on top of each other. Each layer contains separate pieces of content. Some parts of a layer can be transparent, allowing you to see through to the layers below. This setup lets you edit parts of an image independently without affecting the rest of the image. You manage and work with layers in the Layers panel, where you can toggle their visibility on and off using the Eye icon. (See explanation around 0:28–1:00 and 1:25–2:32) 🎨🖼️\n",
257
+ "**References**:\n",
258
+ "[\n",
259
+ " {\n",
260
+ " \"title\": \"Understand layers\",\n",
261
+ " \"source\": \"https://images-tv.adobe.com/avp/vr/b758b4c4-2a74-41f4-8e67-e2f2eab83c6a/f810fc5b-2b04-4e23-8fa4-5c532e7de6f8/e268fe4d-e5c7-415c-9f5c-d34d024b14d8_20170727011753.1280x720at2400_h264.mp4\",\n",
262
+ " \"start\": 0.47,\n",
263
+ " \"stop\": 62.14\n",
264
+ " },\n",
265
+ " {\n",
266
+ " \"title\": \"Understand layers\",\n",
267
+ " \"source\": \"https://images-tv.adobe.com/avp/vr/b758b4c4-2a74-41f4-8e67-e2f2eab83c6a/f810fc5b-2b04-4e23-8fa4-5c532e7de6f8/e268fe4d-e5c7-415c-9f5c-d34d024b14d8_20170727011753.1280x720at2400_h264.mp4\",\n",
268
+ " \"start\": 85.75,\n",
269
+ " \"stop\": 152.97\n",
270
+ " }\n",
271
+ "]\n"
272
+ ]
273
+ }
274
+ ],
275
+ "source": [
276
+ "val.pretty_print()"
277
+ ]
278
+ },
279
+ {
280
+ "cell_type": "code",
281
+ "execution_count": 50,
282
+ "metadata": {},
283
+ "outputs": [
284
+ {
285
+ "data": {
286
+ "text/plain": [
287
+ "'Layers are the building blocks of any image in Photoshop CC. You can think of layers like separate flat panes of glass stacked on top of each other. Each layer contains separate pieces of content. Some parts of a layer can be transparent, allowing you to see through to the layers below. This setup lets you edit parts of an image independently without affecting the rest of the image. You manage and work with layers in the Layers panel, where you can toggle their visibility on and off using the Eye icon. (See explanation around 0:28–1:00 and 1:25–2:32) 🎨🖼️\\n**References**:\\n[\\n {\\n \"title\": \"Understand layers\",\\n \"source\": \"https://images-tv.adobe.com/avp/vr/b758b4c4-2a74-41f4-8e67-e2f2eab83c6a/f810fc5b-2b04-4e23-8fa4-5c532e7de6f8/e268fe4d-e5c7-415c-9f5c-d34d024b14d8_20170727011753.1280x720at2400_h264.mp4\",\\n \"start\": 0.47,\\n \"stop\": 62.14\\n },\\n {\\n \"title\": \"Understand layers\",\\n \"source\": \"https://images-tv.adobe.com/avp/vr/b758b4c4-2a74-41f4-8e67-e2f2eab83c6a/f810fc5b-2b04-4e23-8fa4-5c532e7de6f8/e268fe4d-e5c7-415c-9f5c-d34d024b14d8_20170727011753.1280x720at2400_h264.mp4\",\\n \"start\": 85.75,\\n \"stop\": 152.97\\n }\\n]'"
288
+ ]
289
+ },
290
+ "execution_count": 50,
291
+ "metadata": {},
292
+ "output_type": "execute_result"
293
+ }
294
+ ],
295
+ "source": [
296
+ "val.content"
297
+ ]
298
+ },
299
+ {
300
+ "cell_type": "code",
301
+ "execution_count": null,
302
+ "metadata": {},
303
+ "outputs": [],
304
+ "source": []
305
+ }
306
+ ],
307
+ "metadata": {
308
+ "kernelspec": {
309
+ "display_name": ".venv",
310
+ "language": "python",
311
+ "name": "python3"
312
+ },
313
+ "language_info": {
314
+ "codemirror_mode": {
315
+ "name": "ipython",
316
+ "version": 3
317
+ },
318
+ "file_extension": ".py",
319
+ "mimetype": "text/x-python",
320
+ "name": "python",
321
+ "nbconvert_exporter": "python",
322
+ "pygments_lexer": "ipython3",
323
+ "version": "3.13.2"
324
+ }
325
+ },
326
+ "nbformat": 4,
327
+ "nbformat_minor": 2
328
+ }