ArSenic04 commited on
Commit
925fc24
·
verified ·
1 Parent(s): 9e44f76

Upload LLM_+_RAG_for_Finance.ipynb

Browse files
Files changed (1) hide show
  1. LLM_+_RAG_for_Finance.ipynb +1615 -0
LLM_+_RAG_for_Finance.ipynb ADDED
@@ -0,0 +1,1615 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {
6
+ "id": "se7JaGtMP27J"
7
+ },
8
+ "source": [
9
+ "# **LLM + RAG Projects on Finance Domain**\n",
10
+ "**Author**: Shivam Ardeshna\n",
11
+ "\n",
12
+ "This notebook contains the use cases of RAG and LLM in Finance Domain using Python + Langchain and Open Source LLMs and Vector DBs. Work as a *Recommendation Engine*\n",
13
+ "\n"
14
+ ]
15
+ },
16
+ {
17
+ "cell_type": "markdown",
18
+ "metadata": {
19
+ "id": "atrU6fljvQUB"
20
+ },
21
+ "source": [
22
+ "# **Build Short Financial Report using Economic Indicators from the API**\n",
23
+ "Using Financial Modelling Prep API, fetching the Topic Market Economic Indicators.\n",
24
+ "\n",
25
+ "**Problem Statment:** Building Financial Report of a Company or Stock using Latest Stock Market or Economic Data without Traning or Fine Tuning the LLMs or ML Models.\n",
26
+ "\n",
27
+ "**Project Methodology**\n",
28
+ "- This Project using the open source API to fetch the latest financial modelling data regarding Company Metrics and Market Economic Indicators.\n",
29
+ "- Using Python, that fetched data is pre-processed and saved in CSV File.\n",
30
+ "- Loading that same CSV file to insert into Vector DB using Embedding Model from Hugging Face.\n",
31
+ "- Building RAG QA Chain using Langchain and building the RAG architecture using Falcon 7B LLM (Open Source).\n",
32
+ "- Checking the Response.\n",
33
+ "\n",
34
+ "**NOTE:** This Full Playlist or Course using Open Source LLMs so Responses of the Projects might not be as accurate as it can but using OpenAI GPT or Meta LLAMA Models can drastically increase the output accuracy using same code as I am teaching.\n",
35
+ "\n",
36
+ "\n",
37
+ "![](https://media.licdn.com/dms/image/D5622AQFvnkgDSWCi4A/feedshare-shrink_800/0/1695081465240?e=2147483647&v=beta&t=mu9zgB9y-_sReXMyF9tyALz7bdUla2laZBEHPtm4glE)"
38
+ ]
39
+ },
40
+ {
41
+ "cell_type": "code",
42
+ "execution_count": null,
43
+ "metadata": {
44
+ "colab": {
45
+ "base_uri": "https://localhost:8080/",
46
+ "height": 183
47
+ },
48
+ "id": "UejqRrVqvQCT",
49
+ "outputId": "e37972cd-8a41-446c-f52a-b79c68571ef0"
50
+ },
51
+ "outputs": [
52
+ {
53
+ "name": "stderr",
54
+ "output_type": "stream",
55
+ "text": [
56
+ "<ipython-input-48-565457cae0e7>:15: DeprecationWarning: cafile, capath and cadefault are deprecated, use a custom context instead.\n",
57
+ " response = urlopen(url, cafile=certifi.where())\n"
58
+ ]
59
+ },
60
+ {
61
+ "data": {
62
+ "application/vnd.google.colaboratory.intrinsic+json": {
63
+ "type": "dataframe",
64
+ "variable_name": "eco_ind"
65
+ },
66
+ "text/html": [
67
+ "\n",
68
+ " <div id=\"df-f327fc75-d759-4379-b5a7-5c8c09e5880a\" class=\"colab-df-container\">\n",
69
+ " <div>\n",
70
+ "<style scoped>\n",
71
+ " .dataframe tbody tr th:only-of-type {\n",
72
+ " vertical-align: middle;\n",
73
+ " }\n",
74
+ "\n",
75
+ " .dataframe tbody tr th {\n",
76
+ " vertical-align: top;\n",
77
+ " }\n",
78
+ "\n",
79
+ " .dataframe thead th {\n",
80
+ " text-align: right;\n",
81
+ " }\n",
82
+ "</style>\n",
83
+ "<table border=\"1\" class=\"dataframe\">\n",
84
+ " <thead>\n",
85
+ " <tr style=\"text-align: right;\">\n",
86
+ " <th></th>\n",
87
+ " <th>symbol</th>\n",
88
+ " <th>name</th>\n",
89
+ " <th>price</th>\n",
90
+ " <th>changesPercentage</th>\n",
91
+ " <th>change</th>\n",
92
+ " <th>dayLow</th>\n",
93
+ " <th>dayHigh</th>\n",
94
+ " <th>yearHigh</th>\n",
95
+ " <th>yearLow</th>\n",
96
+ " <th>marketCap</th>\n",
97
+ " <th>...</th>\n",
98
+ " <th>exchange</th>\n",
99
+ " <th>volume</th>\n",
100
+ " <th>avgVolume</th>\n",
101
+ " <th>open</th>\n",
102
+ " <th>previousClose</th>\n",
103
+ " <th>eps</th>\n",
104
+ " <th>pe</th>\n",
105
+ " <th>earningsAnnouncement</th>\n",
106
+ " <th>sharesOutstanding</th>\n",
107
+ " <th>timestamp</th>\n",
108
+ " </tr>\n",
109
+ " </thead>\n",
110
+ " <tbody>\n",
111
+ " <tr>\n",
112
+ " <th>0</th>\n",
113
+ " <td>MSFT</td>\n",
114
+ " <td>Microsoft Corporation</td>\n",
115
+ " <td>423.85</td>\n",
116
+ " <td>-0.1578</td>\n",
117
+ " <td>-0.67</td>\n",
118
+ " <td>423.05</td>\n",
119
+ " <td>426.28</td>\n",
120
+ " <td>433.6</td>\n",
121
+ " <td>309.45</td>\n",
122
+ " <td>3150184593500</td>\n",
123
+ " <td>...</td>\n",
124
+ " <td>NASDAQ</td>\n",
125
+ " <td>11920235</td>\n",
126
+ " <td>19701822</td>\n",
127
+ " <td>426.2</td>\n",
128
+ " <td>424.52</td>\n",
129
+ " <td>11.55</td>\n",
130
+ " <td>36.7</td>\n",
131
+ " <td>2024-07-23T00:00:00.000+0000</td>\n",
132
+ " <td>7432310000</td>\n",
133
+ " <td>1717790401</td>\n",
134
+ " </tr>\n",
135
+ " </tbody>\n",
136
+ "</table>\n",
137
+ "<p>1 rows × 22 columns</p>\n",
138
+ "</div>\n",
139
+ " <div class=\"colab-df-buttons\">\n",
140
+ "\n",
141
+ " <div class=\"colab-df-container\">\n",
142
+ " <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-f327fc75-d759-4379-b5a7-5c8c09e5880a')\"\n",
143
+ " title=\"Convert this dataframe to an interactive table.\"\n",
144
+ " style=\"display:none;\">\n",
145
+ "\n",
146
+ " <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
147
+ " <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
148
+ " </svg>\n",
149
+ " </button>\n",
150
+ "\n",
151
+ " <style>\n",
152
+ " .colab-df-container {\n",
153
+ " display:flex;\n",
154
+ " gap: 12px;\n",
155
+ " }\n",
156
+ "\n",
157
+ " .colab-df-convert {\n",
158
+ " background-color: #E8F0FE;\n",
159
+ " border: none;\n",
160
+ " border-radius: 50%;\n",
161
+ " cursor: pointer;\n",
162
+ " display: none;\n",
163
+ " fill: #1967D2;\n",
164
+ " height: 32px;\n",
165
+ " padding: 0 0 0 0;\n",
166
+ " width: 32px;\n",
167
+ " }\n",
168
+ "\n",
169
+ " .colab-df-convert:hover {\n",
170
+ " background-color: #E2EBFA;\n",
171
+ " box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
172
+ " fill: #174EA6;\n",
173
+ " }\n",
174
+ "\n",
175
+ " .colab-df-buttons div {\n",
176
+ " margin-bottom: 4px;\n",
177
+ " }\n",
178
+ "\n",
179
+ " [theme=dark] .colab-df-convert {\n",
180
+ " background-color: #3B4455;\n",
181
+ " fill: #D2E3FC;\n",
182
+ " }\n",
183
+ "\n",
184
+ " [theme=dark] .colab-df-convert:hover {\n",
185
+ " background-color: #434B5C;\n",
186
+ " box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
187
+ " filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
188
+ " fill: #FFFFFF;\n",
189
+ " }\n",
190
+ " </style>\n",
191
+ "\n",
192
+ " <script>\n",
193
+ " const buttonEl =\n",
194
+ " document.querySelector('#df-f327fc75-d759-4379-b5a7-5c8c09e5880a button.colab-df-convert');\n",
195
+ " buttonEl.style.display =\n",
196
+ " google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
197
+ "\n",
198
+ " async function convertToInteractive(key) {\n",
199
+ " const element = document.querySelector('#df-f327fc75-d759-4379-b5a7-5c8c09e5880a');\n",
200
+ " const dataTable =\n",
201
+ " await google.colab.kernel.invokeFunction('convertToInteractive',\n",
202
+ " [key], {});\n",
203
+ " if (!dataTable) return;\n",
204
+ "\n",
205
+ " const docLinkHtml = 'Like what you see? Visit the ' +\n",
206
+ " '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
207
+ " + ' to learn more about interactive tables.';\n",
208
+ " element.innerHTML = '';\n",
209
+ " dataTable['output_type'] = 'display_data';\n",
210
+ " await google.colab.output.renderOutput(dataTable, element);\n",
211
+ " const docLink = document.createElement('div');\n",
212
+ " docLink.innerHTML = docLinkHtml;\n",
213
+ " element.appendChild(docLink);\n",
214
+ " }\n",
215
+ " </script>\n",
216
+ " </div>\n",
217
+ "\n",
218
+ "\n",
219
+ " <div id=\"id_620a8df5-983e-4490-8a25-0d1df357381d\">\n",
220
+ " <style>\n",
221
+ " .colab-df-generate {\n",
222
+ " background-color: #E8F0FE;\n",
223
+ " border: none;\n",
224
+ " border-radius: 50%;\n",
225
+ " cursor: pointer;\n",
226
+ " display: none;\n",
227
+ " fill: #1967D2;\n",
228
+ " height: 32px;\n",
229
+ " padding: 0 0 0 0;\n",
230
+ " width: 32px;\n",
231
+ " }\n",
232
+ "\n",
233
+ " .colab-df-generate:hover {\n",
234
+ " background-color: #E2EBFA;\n",
235
+ " box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
236
+ " fill: #174EA6;\n",
237
+ " }\n",
238
+ "\n",
239
+ " [theme=dark] .colab-df-generate {\n",
240
+ " background-color: #3B4455;\n",
241
+ " fill: #D2E3FC;\n",
242
+ " }\n",
243
+ "\n",
244
+ " [theme=dark] .colab-df-generate:hover {\n",
245
+ " background-color: #434B5C;\n",
246
+ " box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
247
+ " filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
248
+ " fill: #FFFFFF;\n",
249
+ " }\n",
250
+ " </style>\n",
251
+ " <button class=\"colab-df-generate\" onclick=\"generateWithVariable('eco_ind')\"\n",
252
+ " title=\"Generate code using this dataframe.\"\n",
253
+ " style=\"display:none;\">\n",
254
+ "\n",
255
+ " <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
256
+ " width=\"24px\">\n",
257
+ " <path d=\"M7,19H8.4L18.45,9,17,7.55,7,17.6ZM5,21V16.75L18.45,3.32a2,2,0,0,1,2.83,0l1.4,1.43a1.91,1.91,0,0,1,.58,1.4,1.91,1.91,0,0,1-.58,1.4L9.25,21ZM18.45,9,17,7.55Zm-12,3A5.31,5.31,0,0,0,4.9,8.1,5.31,5.31,0,0,0,1,6.5,5.31,5.31,0,0,0,4.9,4.9,5.31,5.31,0,0,0,6.5,1,5.31,5.31,0,0,0,8.1,4.9,5.31,5.31,0,0,0,12,6.5,5.46,5.46,0,0,0,6.5,12Z\"/>\n",
258
+ " </svg>\n",
259
+ " </button>\n",
260
+ " <script>\n",
261
+ " (() => {\n",
262
+ " const buttonEl =\n",
263
+ " document.querySelector('#id_620a8df5-983e-4490-8a25-0d1df357381d button.colab-df-generate');\n",
264
+ " buttonEl.style.display =\n",
265
+ " google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
266
+ "\n",
267
+ " buttonEl.onclick = () => {\n",
268
+ " google.colab.notebook.generateWithVariable('eco_ind');\n",
269
+ " }\n",
270
+ " })();\n",
271
+ " </script>\n",
272
+ " </div>\n",
273
+ "\n",
274
+ " </div>\n",
275
+ " </div>\n"
276
+ ],
277
+ "text/plain": [
278
+ " symbol name price changesPercentage change dayLow \\\n",
279
+ "0 MSFT Microsoft Corporation 423.85 -0.1578 -0.67 423.05 \n",
280
+ "\n",
281
+ " dayHigh yearHigh yearLow marketCap ... exchange volume \\\n",
282
+ "0 426.28 433.6 309.45 3150184593500 ... NASDAQ 11920235 \n",
283
+ "\n",
284
+ " avgVolume open previousClose eps pe earningsAnnouncement \\\n",
285
+ "0 19701822 426.2 424.52 11.55 36.7 2024-07-23T00:00:00.000+0000 \n",
286
+ "\n",
287
+ " sharesOutstanding timestamp \n",
288
+ "0 7432310000 1717790401 \n",
289
+ "\n",
290
+ "[1 rows x 22 columns]"
291
+ ]
292
+ },
293
+ "execution_count": 48,
294
+ "metadata": {},
295
+ "output_type": "execute_result"
296
+ }
297
+ ],
298
+ "source": [
299
+ "try:\n",
300
+ " from urllib.request import urlopen\n",
301
+ "except ImportError:\n",
302
+ " from urllib2 import urlopen\n",
303
+ "\n",
304
+ "import certifi\n",
305
+ "import json\n",
306
+ "import pandas as pd\n",
307
+ "\n",
308
+ "\n",
309
+ "def get_jsonparsed_data(url, api_key, exchange):\n",
310
+ " if exchange == \"NSE\":\n",
311
+ " url = f\"https://financialmodelingprep.com/api/v3/search?query={ticker}&exchange=NSE&apikey={api_key}\"\n",
312
+ " else:\n",
313
+ " url = f\"https://financialmodelingprep.com/api/v3/quote/{ticker}?apikey={api_key}\"\n",
314
+ " response = urlopen(url, cafile=certifi.where())\n",
315
+ " data = response.read().decode(\"utf-8\")\n",
316
+ " return json.loads(data)\n",
317
+ "\n",
318
+ "api_key=\"C1HRSweTniWdBuLmTTse9w8KpkoiouM5\"\n",
319
+ "ticker = \"MSFT\"\n",
320
+ "exchange = \"US\"\n",
321
+ "eco_ind = pd.DataFrame(get_jsonparsed_data(ticker, api_key,exchange))\n",
322
+ "eco_ind"
323
+ ]
324
+ },
325
+ {
326
+ "cell_type": "markdown",
327
+ "metadata": {
328
+ "id": "3_njvKT31FyT"
329
+ },
330
+ "source": [
331
+ "### Installing the Langchain Libraries"
332
+ ]
333
+ },
334
+ {
335
+ "cell_type": "code",
336
+ "execution_count": null,
337
+ "metadata": {
338
+ "id": "qO2ijrFvv1MR"
339
+ },
340
+ "outputs": [],
341
+ "source": [
342
+ "!pip install langchain langchain-community langchain-core transformers"
343
+ ]
344
+ },
345
+ {
346
+ "cell_type": "code",
347
+ "execution_count": null,
348
+ "metadata": {
349
+ "colab": {
350
+ "base_uri": "https://localhost:8080/",
351
+ "height": 147
352
+ },
353
+ "id": "4fE3u3ti0NkU",
354
+ "outputId": "7e14bbbf-9d85-4e7a-cf9e-87cbeb6abf49"
355
+ },
356
+ "outputs": [
357
+ {
358
+ "data": {
359
+ "application/vnd.google.colaboratory.intrinsic+json": {
360
+ "type": "dataframe",
361
+ "variable_name": "eco_ind"
362
+ },
363
+ "text/html": [
364
+ "\n",
365
+ " <div id=\"df-ff3f505e-b299-4aa0-a7de-e59572de4212\" class=\"colab-df-container\">\n",
366
+ " <div>\n",
367
+ "<style scoped>\n",
368
+ " .dataframe tbody tr th:only-of-type {\n",
369
+ " vertical-align: middle;\n",
370
+ " }\n",
371
+ "\n",
372
+ " .dataframe tbody tr th {\n",
373
+ " vertical-align: top;\n",
374
+ " }\n",
375
+ "\n",
376
+ " .dataframe thead th {\n",
377
+ " text-align: right;\n",
378
+ " }\n",
379
+ "</style>\n",
380
+ "<table border=\"1\" class=\"dataframe\">\n",
381
+ " <thead>\n",
382
+ " <tr style=\"text-align: right;\">\n",
383
+ " <th></th>\n",
384
+ " <th>symbol</th>\n",
385
+ " <th>name</th>\n",
386
+ " <th>price</th>\n",
387
+ " <th>changesPercentage</th>\n",
388
+ " <th>change</th>\n",
389
+ " <th>dayLow</th>\n",
390
+ " <th>dayHigh</th>\n",
391
+ " <th>yearHigh</th>\n",
392
+ " <th>yearLow</th>\n",
393
+ " <th>marketCap</th>\n",
394
+ " <th>...</th>\n",
395
+ " <th>exchange</th>\n",
396
+ " <th>volume</th>\n",
397
+ " <th>avgVolume</th>\n",
398
+ " <th>open</th>\n",
399
+ " <th>previousClose</th>\n",
400
+ " <th>eps</th>\n",
401
+ " <th>pe</th>\n",
402
+ " <th>earningsAnnouncement</th>\n",
403
+ " <th>sharesOutstanding</th>\n",
404
+ " <th>timestamp</th>\n",
405
+ " </tr>\n",
406
+ " </thead>\n",
407
+ " <tbody>\n",
408
+ " <tr>\n",
409
+ " <th>0</th>\n",
410
+ " <td>MSFT</td>\n",
411
+ " <td>Microsoft Corporation</td>\n",
412
+ " <td>423.85</td>\n",
413
+ " <td>-0.1578</td>\n",
414
+ " <td>-0.67</td>\n",
415
+ " <td>423.05</td>\n",
416
+ " <td>426.28</td>\n",
417
+ " <td>433.6</td>\n",
418
+ " <td>309.45</td>\n",
419
+ " <td>3150184593500</td>\n",
420
+ " <td>...</td>\n",
421
+ " <td>NASDAQ</td>\n",
422
+ " <td>11920235</td>\n",
423
+ " <td>19701822</td>\n",
424
+ " <td>426.2</td>\n",
425
+ " <td>424.52</td>\n",
426
+ " <td>11.55</td>\n",
427
+ " <td>36.7</td>\n",
428
+ " <td>2024-07-23 00:00:00+00:00</td>\n",
429
+ " <td>7432310000</td>\n",
430
+ " <td>1970-01-01 00:00:01.717790401</td>\n",
431
+ " </tr>\n",
432
+ " </tbody>\n",
433
+ "</table>\n",
434
+ "<p>1 rows × 22 columns</p>\n",
435
+ "</div>\n",
436
+ " <div class=\"colab-df-buttons\">\n",
437
+ "\n",
438
+ " <div class=\"colab-df-container\">\n",
439
+ " <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-ff3f505e-b299-4aa0-a7de-e59572de4212')\"\n",
440
+ " title=\"Convert this dataframe to an interactive table.\"\n",
441
+ " style=\"display:none;\">\n",
442
+ "\n",
443
+ " <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
444
+ " <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
445
+ " </svg>\n",
446
+ " </button>\n",
447
+ "\n",
448
+ " <style>\n",
449
+ " .colab-df-container {\n",
450
+ " display:flex;\n",
451
+ " gap: 12px;\n",
452
+ " }\n",
453
+ "\n",
454
+ " .colab-df-convert {\n",
455
+ " background-color: #E8F0FE;\n",
456
+ " border: none;\n",
457
+ " border-radius: 50%;\n",
458
+ " cursor: pointer;\n",
459
+ " display: none;\n",
460
+ " fill: #1967D2;\n",
461
+ " height: 32px;\n",
462
+ " padding: 0 0 0 0;\n",
463
+ " width: 32px;\n",
464
+ " }\n",
465
+ "\n",
466
+ " .colab-df-convert:hover {\n",
467
+ " background-color: #E2EBFA;\n",
468
+ " box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
469
+ " fill: #174EA6;\n",
470
+ " }\n",
471
+ "\n",
472
+ " .colab-df-buttons div {\n",
473
+ " margin-bottom: 4px;\n",
474
+ " }\n",
475
+ "\n",
476
+ " [theme=dark] .colab-df-convert {\n",
477
+ " background-color: #3B4455;\n",
478
+ " fill: #D2E3FC;\n",
479
+ " }\n",
480
+ "\n",
481
+ " [theme=dark] .colab-df-convert:hover {\n",
482
+ " background-color: #434B5C;\n",
483
+ " box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
484
+ " filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
485
+ " fill: #FFFFFF;\n",
486
+ " }\n",
487
+ " </style>\n",
488
+ "\n",
489
+ " <script>\n",
490
+ " const buttonEl =\n",
491
+ " document.querySelector('#df-ff3f505e-b299-4aa0-a7de-e59572de4212 button.colab-df-convert');\n",
492
+ " buttonEl.style.display =\n",
493
+ " google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
494
+ "\n",
495
+ " async function convertToInteractive(key) {\n",
496
+ " const element = document.querySelector('#df-ff3f505e-b299-4aa0-a7de-e59572de4212');\n",
497
+ " const dataTable =\n",
498
+ " await google.colab.kernel.invokeFunction('convertToInteractive',\n",
499
+ " [key], {});\n",
500
+ " if (!dataTable) return;\n",
501
+ "\n",
502
+ " const docLinkHtml = 'Like what you see? Visit the ' +\n",
503
+ " '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
504
+ " + ' to learn more about interactive tables.';\n",
505
+ " element.innerHTML = '';\n",
506
+ " dataTable['output_type'] = 'display_data';\n",
507
+ " await google.colab.output.renderOutput(dataTable, element);\n",
508
+ " const docLink = document.createElement('div');\n",
509
+ " docLink.innerHTML = docLinkHtml;\n",
510
+ " element.appendChild(docLink);\n",
511
+ " }\n",
512
+ " </script>\n",
513
+ " </div>\n",
514
+ "\n",
515
+ "\n",
516
+ " <div id=\"id_f7a8b978-a92b-4d43-929a-ac0d78b92c3e\">\n",
517
+ " <style>\n",
518
+ " .colab-df-generate {\n",
519
+ " background-color: #E8F0FE;\n",
520
+ " border: none;\n",
521
+ " border-radius: 50%;\n",
522
+ " cursor: pointer;\n",
523
+ " display: none;\n",
524
+ " fill: #1967D2;\n",
525
+ " height: 32px;\n",
526
+ " padding: 0 0 0 0;\n",
527
+ " width: 32px;\n",
528
+ " }\n",
529
+ "\n",
530
+ " .colab-df-generate:hover {\n",
531
+ " background-color: #E2EBFA;\n",
532
+ " box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
533
+ " fill: #174EA6;\n",
534
+ " }\n",
535
+ "\n",
536
+ " [theme=dark] .colab-df-generate {\n",
537
+ " background-color: #3B4455;\n",
538
+ " fill: #D2E3FC;\n",
539
+ " }\n",
540
+ "\n",
541
+ " [theme=dark] .colab-df-generate:hover {\n",
542
+ " background-color: #434B5C;\n",
543
+ " box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
544
+ " filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
545
+ " fill: #FFFFFF;\n",
546
+ " }\n",
547
+ " </style>\n",
548
+ " <button class=\"colab-df-generate\" onclick=\"generateWithVariable('eco_ind')\"\n",
549
+ " title=\"Generate code using this dataframe.\"\n",
550
+ " style=\"display:none;\">\n",
551
+ "\n",
552
+ " <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
553
+ " width=\"24px\">\n",
554
+ " <path d=\"M7,19H8.4L18.45,9,17,7.55,7,17.6ZM5,21V16.75L18.45,3.32a2,2,0,0,1,2.83,0l1.4,1.43a1.91,1.91,0,0,1,.58,1.4,1.91,1.91,0,0,1-.58,1.4L9.25,21ZM18.45,9,17,7.55Zm-12,3A5.31,5.31,0,0,0,4.9,8.1,5.31,5.31,0,0,0,1,6.5,5.31,5.31,0,0,0,4.9,4.9,5.31,5.31,0,0,0,6.5,1,5.31,5.31,0,0,0,8.1,4.9,5.31,5.31,0,0,0,12,6.5,5.46,5.46,0,0,0,6.5,12Z\"/>\n",
555
+ " </svg>\n",
556
+ " </button>\n",
557
+ " <script>\n",
558
+ " (() => {\n",
559
+ " const buttonEl =\n",
560
+ " document.querySelector('#id_f7a8b978-a92b-4d43-929a-ac0d78b92c3e button.colab-df-generate');\n",
561
+ " buttonEl.style.display =\n",
562
+ " google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
563
+ "\n",
564
+ " buttonEl.onclick = () => {\n",
565
+ " google.colab.notebook.generateWithVariable('eco_ind');\n",
566
+ " }\n",
567
+ " })();\n",
568
+ " </script>\n",
569
+ " </div>\n",
570
+ "\n",
571
+ " </div>\n",
572
+ " </div>\n"
573
+ ],
574
+ "text/plain": [
575
+ " symbol name price changesPercentage change dayLow \\\n",
576
+ "0 MSFT Microsoft Corporation 423.85 -0.1578 -0.67 423.05 \n",
577
+ "\n",
578
+ " dayHigh yearHigh yearLow marketCap ... exchange volume \\\n",
579
+ "0 426.28 433.6 309.45 3150184593500 ... NASDAQ 11920235 \n",
580
+ "\n",
581
+ " avgVolume open previousClose eps pe earningsAnnouncement \\\n",
582
+ "0 19701822 426.2 424.52 11.55 36.7 2024-07-23 00:00:00+00:00 \n",
583
+ "\n",
584
+ " sharesOutstanding timestamp \n",
585
+ "0 7432310000 1970-01-01 00:00:01.717790401 \n",
586
+ "\n",
587
+ "[1 rows x 22 columns]"
588
+ ]
589
+ },
590
+ "execution_count": 83,
591
+ "metadata": {},
592
+ "output_type": "execute_result"
593
+ }
594
+ ],
595
+ "source": [
596
+ "def preprocess_economic_data(df):\n",
597
+ " df['timestamp'] = pd.to_datetime(df['timestamp'])\n",
598
+ " df['earningsAnnouncement'] = pd.to_datetime(df['earningsAnnouncement'])\n",
599
+ " return df\n",
600
+ "\n",
601
+ "preprocessed_economic_df = preprocess_economic_data(eco_ind)\n",
602
+ "preprocessed_economic_df"
603
+ ]
604
+ },
605
+ {
606
+ "cell_type": "markdown",
607
+ "metadata": {
608
+ "id": "vB6Pqxw-1Js2"
609
+ },
610
+ "source": [
611
+ "### Storing the Pre-Processed Data into CSV"
612
+ ]
613
+ },
614
+ {
615
+ "cell_type": "code",
616
+ "execution_count": null,
617
+ "metadata": {
618
+ "id": "SjBCSabk2MLl"
619
+ },
620
+ "outputs": [],
621
+ "source": [
622
+ "preprocessed_economic_df.to_csv(\"eco_ind.csv\")"
623
+ ]
624
+ },
625
+ {
626
+ "cell_type": "markdown",
627
+ "metadata": {
628
+ "id": "Mm6P8O0U1Mmh"
629
+ },
630
+ "source": [
631
+ "### Installing the Hugging Face Embedding Library"
632
+ ]
633
+ },
634
+ {
635
+ "cell_type": "code",
636
+ "execution_count": null,
637
+ "metadata": {
638
+ "colab": {
639
+ "base_uri": "https://localhost:8080/"
640
+ },
641
+ "id": "v4M_vC-n2qpM",
642
+ "outputId": "73e99b0c-9382-4b0f-a054-1a7f5fc9f942"
643
+ },
644
+ "outputs": [
645
+ {
646
+ "name": "stdout",
647
+ "output_type": "stream",
648
+ "text": [
649
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m227.1/227.1 kB\u001b[0m \u001b[31m6.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
650
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m21.3/21.3 MB\u001b[0m \u001b[31m43.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
651
+ "\u001b[?25h"
652
+ ]
653
+ }
654
+ ],
655
+ "source": [
656
+ "%pip install --upgrade --quiet langchain sentence_transformers"
657
+ ]
658
+ },
659
+ {
660
+ "cell_type": "code",
661
+ "execution_count": null,
662
+ "metadata": {
663
+ "id": "YG-jX0gK2zYh"
664
+ },
665
+ "outputs": [],
666
+ "source": [
667
+ "from langchain_community.embeddings import HuggingFaceEmbeddings\n",
668
+ "hg_embeddings = HuggingFaceEmbeddings()"
669
+ ]
670
+ },
671
+ {
672
+ "cell_type": "code",
673
+ "execution_count": null,
674
+ "metadata": {
675
+ "colab": {
676
+ "base_uri": "https://localhost:8080/"
677
+ },
678
+ "id": "f29_EHBt7NWi",
679
+ "outputId": "24dbd18a-b8d1-47a3-93eb-19f8db0d2769"
680
+ },
681
+ "outputs": [
682
+ {
683
+ "name": "stderr",
684
+ "output_type": "stream",
685
+ "text": [
686
+ "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
687
+ " warnings.warn(\n"
688
+ ]
689
+ }
690
+ ],
691
+ "source": [
692
+ "from langchain.document_loaders import CSVLoader\n",
693
+ "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
694
+ "loader_eco = CSVLoader('eco_ind.csv')\n",
695
+ "documents_eco = loader_eco.load()\n",
696
+ "\n",
697
+ "# Get your splitter ready\n",
698
+ "text_splitter = RecursiveCharacterTextSplitter(chunk_size=50, chunk_overlap=5)\n",
699
+ "\n",
700
+ "# Split your docs into texts\n",
701
+ "texts_eco = text_splitter.split_documents(documents_eco)\n",
702
+ "\n",
703
+ "# Embeddings\n",
704
+ "embeddings = HuggingFaceEmbeddings()"
705
+ ]
706
+ },
707
+ {
708
+ "cell_type": "markdown",
709
+ "metadata": {
710
+ "id": "3to6jsIL1Q8Z"
711
+ },
712
+ "source": [
713
+ "### Building the Vector DB for RAG"
714
+ ]
715
+ },
716
+ {
717
+ "cell_type": "code",
718
+ "execution_count": null,
719
+ "metadata": {
720
+ "id": "-_tOru081yRn"
721
+ },
722
+ "outputs": [],
723
+ "source": [
724
+ "from langchain.vectorstores import Chroma\n",
725
+ "\n",
726
+ "persist_directory = 'docs/chroma_rag/'"
727
+ ]
728
+ },
729
+ {
730
+ "cell_type": "code",
731
+ "execution_count": null,
732
+ "metadata": {
733
+ "id": "2k-xGbN08W0R"
734
+ },
735
+ "outputs": [],
736
+ "source": [
737
+ "economic_langchain_chroma = Chroma.from_documents(\n",
738
+ " documents=texts_eco,\n",
739
+ " collection_name=\"economic_data\",\n",
740
+ " embedding=hg_embeddings,\n",
741
+ " persist_directory=persist_directory\n",
742
+ ")"
743
+ ]
744
+ },
745
+ {
746
+ "cell_type": "code",
747
+ "execution_count": null,
748
+ "metadata": {
749
+ "id": "0Oyqi4X318LG"
750
+ },
751
+ "outputs": [],
752
+ "source": [
753
+ "question = \"Microsoft(MSFT)\"\n",
754
+ "docs_eco = economic_langchain_chroma.similarity_search(question,k=3)"
755
+ ]
756
+ },
757
+ {
758
+ "cell_type": "markdown",
759
+ "metadata": {
760
+ "id": "7C1duqTJ1T_K"
761
+ },
762
+ "source": [
763
+ "### Building RAG Chain using Vector DB and LLM"
764
+ ]
765
+ },
766
+ {
767
+ "cell_type": "code",
768
+ "execution_count": null,
769
+ "metadata": {
770
+ "id": "fCx35TE867Tu"
771
+ },
772
+ "outputs": [],
773
+ "source": [
774
+ "from langchain.chains import RetrievalQA\n",
775
+ "from langchain.prompts import PromptTemplate\n",
776
+ "from langchain_community.llms import HuggingFaceHub\n",
777
+ "from IPython.display import display, Markdown\n",
778
+ "import os\n",
779
+ "import warnings\n",
780
+ "warnings.filterwarnings('ignore')\n",
781
+ "\n",
782
+ "os.environ[\"HUGGINGFACEHUB_API_TOKEN\"] = \"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\"\n",
783
+ "\n",
784
+ "llm = HuggingFaceHub(\n",
785
+ " repo_id=\"tiiuae/falcon-7b-instruct\",\n",
786
+ " model_kwargs={\"temperature\": 0.1},\n",
787
+ ")\n",
788
+ "\n",
789
+ "retriever_eco = economic_langchain_chroma.as_retriever(search_kwargs={\"k\":2})\n",
790
+ "qs=\"Microsoft(MSFT) Financial Report\"\n",
791
+ "template = \"\"\"You are a Financial Market Expert and Get the Market Economic Data and Market News about Company and Build the Financial Report for me.\n",
792
+ " Understand this Market Information {context} and Answer the Query for this Company {question}. i just need the data into Tabular Form as well.\"\"\"\n",
793
+ "\n",
794
+ "PROMPT = PromptTemplate(input_variables=[\"context\",\"question\"], template=template)\n",
795
+ "qa_with_sources = RetrievalQA.from_chain_type(llm=llm, chain_type=\"stuff\",chain_type_kwargs = {\"prompt\": PROMPT}, retriever=retriever_eco, return_source_documents=True)\n",
796
+ "llm_response = qa_with_sources({\"query\": qs})"
797
+ ]
798
+ },
799
+ {
800
+ "cell_type": "code",
801
+ "execution_count": null,
802
+ "metadata": {
803
+ "colab": {
804
+ "base_uri": "https://localhost:8080/",
805
+ "height": 179
806
+ },
807
+ "id": "i62ZnW54D8Jr",
808
+ "outputId": "d97375d3-1a15-4f09-ef9c-2fd9ebbfbb5d"
809
+ },
810
+ "outputs": [
811
+ {
812
+ "data": {
813
+ "text/markdown": [
814
+ "You are a Financial Market Expert and Get the Market Economic Data and Market News about Company and Build the Financial Report for me.\n",
815
+ " Understand this Market Information : 0\n",
816
+ "symbol: MSFT\n",
817
+ "name: Microsoft Corporation\n",
818
+ "\n",
819
+ "earningsAnnouncement: 2024-07-23 00:00:00+00:00 and Answer the Query for this Company Microsoft(MSFT) Financial Report\n",
820
+ "\n",
821
+ "The following financial report is for Microsoft Corporation (MSFT). The report includes the latest financial data and market news about the company.\n",
822
+ "\n",
823
+ "Financial Report:\n",
824
+ "\n",
825
+ "For the fiscal year ended on June 30, 2024, Microsoft Corporation (MSFT) reported a total revenue of $2.5 trillion, an increase of $1.1 trillion from the previous year. The company's net income for the fiscal year was $128.1 billion"
826
+ ],
827
+ "text/plain": [
828
+ "<IPython.core.display.Markdown object>"
829
+ ]
830
+ },
831
+ "execution_count": 162,
832
+ "metadata": {},
833
+ "output_type": "execute_result"
834
+ }
835
+ ],
836
+ "source": [
837
+ "Markdown(llm_response['result'])"
838
+ ]
839
+ },
840
+ {
841
+ "cell_type": "markdown",
842
+ "metadata": {
843
+ "id": "cNpFaattGiMA"
844
+ },
845
+ "source": [
846
+ "# **Using NEWS API to Build Financial News Summarizer about the Company Sentiment in Current Time**"
847
+ ]
848
+ },
849
+ {
850
+ "cell_type": "markdown",
851
+ "metadata": {
852
+ "id": "465DE_zK1YUy"
853
+ },
854
+ "source": [
855
+ " ### Fetchning the Latest Data using the NEWSAPI with the help of API Key from there website.\n",
856
+ "\n",
857
+ " **Problem Statment:** Building a GenAI based system that can analyse the market news about the whole stock exchange or a company and tell me about the sentiment of market along with analysis based on news.\n",
858
+ "\n",
859
+ "**Project Methodology**\n",
860
+ "- This Project using the open source API to fetch the latest financial news regarding Company and Market.\n",
861
+ "- Using Python, that fetched data is pre-processed and saved in CSV File.\n",
862
+ "- Loading that same CSV file to insert into Vector DB using Embedding Model from Hugging Face.\n",
863
+ "- Building RAG QA Chain using Langchain and building the RAG architecture using Falcon 7B LLM (Open Source).\n",
864
+ "- Checking the Response.\n",
865
+ "\n",
866
+ "\n",
867
+ "![](https://img.freepik.com/premium-photo/bullseye-photography-bull-fighting-fight-generative-ai_901275-24479.jpg)"
868
+ ]
869
+ },
870
+ {
871
+ "cell_type": "code",
872
+ "execution_count": null,
873
+ "metadata": {
874
+ "colab": {
875
+ "base_uri": "https://localhost:8080/",
876
+ "height": 206
877
+ },
878
+ "id": "ZPvl-5XpD-BI",
879
+ "outputId": "b6c5f7dd-fe7a-46c6-c912-1dc77c35f9d0"
880
+ },
881
+ "outputs": [
882
+ {
883
+ "data": {
884
+ "application/vnd.google.colaboratory.intrinsic+json": {
885
+ "summary": "{\n \"name\": \"preprocessed_news_df\",\n \"rows\": 30,\n \"fields\": [\n {\n \"column\": \"author\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 27,\n \"samples\": [\n \"zac.bowden@futurenet.com (Zac Bowden)\",\n \"Hadlee Simons\",\n \"Blair Marnell\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"title\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 30,\n \"samples\": [\n \"Apple WWDC 2024: get ready for lots of AI news\",\n \"Wholesome Pokemon-like \\\"Creatures of Ava\\\" shows off a new trailer, with a playable demo coming soon\",\n \"Elon Musk Is Hurting Tesla To Help Twitter and xAI\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}",
886
+ "type": "dataframe",
887
+ "variable_name": "preprocessed_news_df"
888
+ },
889
+ "text/html": [
890
+ "\n",
891
+ " <div id=\"df-a74891a0-f2cd-40a3-b990-7d9e9ac6bdc9\" class=\"colab-df-container\">\n",
892
+ " <div>\n",
893
+ "<style scoped>\n",
894
+ " .dataframe tbody tr th:only-of-type {\n",
895
+ " vertical-align: middle;\n",
896
+ " }\n",
897
+ "\n",
898
+ " .dataframe tbody tr th {\n",
899
+ " vertical-align: top;\n",
900
+ " }\n",
901
+ "\n",
902
+ " .dataframe thead th {\n",
903
+ " text-align: right;\n",
904
+ " }\n",
905
+ "</style>\n",
906
+ "<table border=\"1\" class=\"dataframe\">\n",
907
+ " <thead>\n",
908
+ " <tr style=\"text-align: right;\">\n",
909
+ " <th></th>\n",
910
+ " <th>author</th>\n",
911
+ " <th>title</th>\n",
912
+ " </tr>\n",
913
+ " </thead>\n",
914
+ " <tbody>\n",
915
+ " <tr>\n",
916
+ " <th>0</th>\n",
917
+ " <td>Kris Holt</td>\n",
918
+ " <td>Summer Game Fest 2024: What to expect and how ...</td>\n",
919
+ " </tr>\n",
920
+ " <tr>\n",
921
+ " <th>1</th>\n",
922
+ " <td>Ali Rees</td>\n",
923
+ " <td>Get some popcorn ready for an extra-long Xbox ...</td>\n",
924
+ " </tr>\n",
925
+ " <tr>\n",
926
+ " <th>2</th>\n",
927
+ " <td>Ali Rees</td>\n",
928
+ " <td>Leaks suggest we could see a huge Starfield an...</td>\n",
929
+ " </tr>\n",
930
+ " <tr>\n",
931
+ " <th>3</th>\n",
932
+ " <td>Wesley Yin-Poole</td>\n",
933
+ " <td>Microsoft Confirms Xbox Game Pass June 2024 Wa...</td>\n",
934
+ " </tr>\n",
935
+ " <tr>\n",
936
+ " <th>4</th>\n",
937
+ " <td>Wesley Yin-Poole</td>\n",
938
+ " <td>Warzone Has a New Frank Woods Cutscene — Final...</td>\n",
939
+ " </tr>\n",
940
+ " </tbody>\n",
941
+ "</table>\n",
942
+ "</div>\n",
943
+ " <div class=\"colab-df-buttons\">\n",
944
+ "\n",
945
+ " <div class=\"colab-df-container\">\n",
946
+ " <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-a74891a0-f2cd-40a3-b990-7d9e9ac6bdc9')\"\n",
947
+ " title=\"Convert this dataframe to an interactive table.\"\n",
948
+ " style=\"display:none;\">\n",
949
+ "\n",
950
+ " <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
951
+ " <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
952
+ " </svg>\n",
953
+ " </button>\n",
954
+ "\n",
955
+ " <style>\n",
956
+ " .colab-df-container {\n",
957
+ " display:flex;\n",
958
+ " gap: 12px;\n",
959
+ " }\n",
960
+ "\n",
961
+ " .colab-df-convert {\n",
962
+ " background-color: #E8F0FE;\n",
963
+ " border: none;\n",
964
+ " border-radius: 50%;\n",
965
+ " cursor: pointer;\n",
966
+ " display: none;\n",
967
+ " fill: #1967D2;\n",
968
+ " height: 32px;\n",
969
+ " padding: 0 0 0 0;\n",
970
+ " width: 32px;\n",
971
+ " }\n",
972
+ "\n",
973
+ " .colab-df-convert:hover {\n",
974
+ " background-color: #E2EBFA;\n",
975
+ " box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
976
+ " fill: #174EA6;\n",
977
+ " }\n",
978
+ "\n",
979
+ " .colab-df-buttons div {\n",
980
+ " margin-bottom: 4px;\n",
981
+ " }\n",
982
+ "\n",
983
+ " [theme=dark] .colab-df-convert {\n",
984
+ " background-color: #3B4455;\n",
985
+ " fill: #D2E3FC;\n",
986
+ " }\n",
987
+ "\n",
988
+ " [theme=dark] .colab-df-convert:hover {\n",
989
+ " background-color: #434B5C;\n",
990
+ " box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
991
+ " filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
992
+ " fill: #FFFFFF;\n",
993
+ " }\n",
994
+ " </style>\n",
995
+ "\n",
996
+ " <script>\n",
997
+ " const buttonEl =\n",
998
+ " document.querySelector('#df-a74891a0-f2cd-40a3-b990-7d9e9ac6bdc9 button.colab-df-convert');\n",
999
+ " buttonEl.style.display =\n",
1000
+ " google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
1001
+ "\n",
1002
+ " async function convertToInteractive(key) {\n",
1003
+ " const element = document.querySelector('#df-a74891a0-f2cd-40a3-b990-7d9e9ac6bdc9');\n",
1004
+ " const dataTable =\n",
1005
+ " await google.colab.kernel.invokeFunction('convertToInteractive',\n",
1006
+ " [key], {});\n",
1007
+ " if (!dataTable) return;\n",
1008
+ "\n",
1009
+ " const docLinkHtml = 'Like what you see? Visit the ' +\n",
1010
+ " '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
1011
+ " + ' to learn more about interactive tables.';\n",
1012
+ " element.innerHTML = '';\n",
1013
+ " dataTable['output_type'] = 'display_data';\n",
1014
+ " await google.colab.output.renderOutput(dataTable, element);\n",
1015
+ " const docLink = document.createElement('div');\n",
1016
+ " docLink.innerHTML = docLinkHtml;\n",
1017
+ " element.appendChild(docLink);\n",
1018
+ " }\n",
1019
+ " </script>\n",
1020
+ " </div>\n",
1021
+ "\n",
1022
+ "\n",
1023
+ "<div id=\"df-a409e5a0-8750-4547-8364-5c1ceafc07d7\">\n",
1024
+ " <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-a409e5a0-8750-4547-8364-5c1ceafc07d7')\"\n",
1025
+ " title=\"Suggest charts\"\n",
1026
+ " style=\"display:none;\">\n",
1027
+ "\n",
1028
+ "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
1029
+ " width=\"24px\">\n",
1030
+ " <g>\n",
1031
+ " <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
1032
+ " </g>\n",
1033
+ "</svg>\n",
1034
+ " </button>\n",
1035
+ "\n",
1036
+ "<style>\n",
1037
+ " .colab-df-quickchart {\n",
1038
+ " --bg-color: #E8F0FE;\n",
1039
+ " --fill-color: #1967D2;\n",
1040
+ " --hover-bg-color: #E2EBFA;\n",
1041
+ " --hover-fill-color: #174EA6;\n",
1042
+ " --disabled-fill-color: #AAA;\n",
1043
+ " --disabled-bg-color: #DDD;\n",
1044
+ " }\n",
1045
+ "\n",
1046
+ " [theme=dark] .colab-df-quickchart {\n",
1047
+ " --bg-color: #3B4455;\n",
1048
+ " --fill-color: #D2E3FC;\n",
1049
+ " --hover-bg-color: #434B5C;\n",
1050
+ " --hover-fill-color: #FFFFFF;\n",
1051
+ " --disabled-bg-color: #3B4455;\n",
1052
+ " --disabled-fill-color: #666;\n",
1053
+ " }\n",
1054
+ "\n",
1055
+ " .colab-df-quickchart {\n",
1056
+ " background-color: var(--bg-color);\n",
1057
+ " border: none;\n",
1058
+ " border-radius: 50%;\n",
1059
+ " cursor: pointer;\n",
1060
+ " display: none;\n",
1061
+ " fill: var(--fill-color);\n",
1062
+ " height: 32px;\n",
1063
+ " padding: 0;\n",
1064
+ " width: 32px;\n",
1065
+ " }\n",
1066
+ "\n",
1067
+ " .colab-df-quickchart:hover {\n",
1068
+ " background-color: var(--hover-bg-color);\n",
1069
+ " box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
1070
+ " fill: var(--button-hover-fill-color);\n",
1071
+ " }\n",
1072
+ "\n",
1073
+ " .colab-df-quickchart-complete:disabled,\n",
1074
+ " .colab-df-quickchart-complete:disabled:hover {\n",
1075
+ " background-color: var(--disabled-bg-color);\n",
1076
+ " fill: var(--disabled-fill-color);\n",
1077
+ " box-shadow: none;\n",
1078
+ " }\n",
1079
+ "\n",
1080
+ " .colab-df-spinner {\n",
1081
+ " border: 2px solid var(--fill-color);\n",
1082
+ " border-color: transparent;\n",
1083
+ " border-bottom-color: var(--fill-color);\n",
1084
+ " animation:\n",
1085
+ " spin 1s steps(1) infinite;\n",
1086
+ " }\n",
1087
+ "\n",
1088
+ " @keyframes spin {\n",
1089
+ " 0% {\n",
1090
+ " border-color: transparent;\n",
1091
+ " border-bottom-color: var(--fill-color);\n",
1092
+ " border-left-color: var(--fill-color);\n",
1093
+ " }\n",
1094
+ " 20% {\n",
1095
+ " border-color: transparent;\n",
1096
+ " border-left-color: var(--fill-color);\n",
1097
+ " border-top-color: var(--fill-color);\n",
1098
+ " }\n",
1099
+ " 30% {\n",
1100
+ " border-color: transparent;\n",
1101
+ " border-left-color: var(--fill-color);\n",
1102
+ " border-top-color: var(--fill-color);\n",
1103
+ " border-right-color: var(--fill-color);\n",
1104
+ " }\n",
1105
+ " 40% {\n",
1106
+ " border-color: transparent;\n",
1107
+ " border-right-color: var(--fill-color);\n",
1108
+ " border-top-color: var(--fill-color);\n",
1109
+ " }\n",
1110
+ " 60% {\n",
1111
+ " border-color: transparent;\n",
1112
+ " border-right-color: var(--fill-color);\n",
1113
+ " }\n",
1114
+ " 80% {\n",
1115
+ " border-color: transparent;\n",
1116
+ " border-right-color: var(--fill-color);\n",
1117
+ " border-bottom-color: var(--fill-color);\n",
1118
+ " }\n",
1119
+ " 90% {\n",
1120
+ " border-color: transparent;\n",
1121
+ " border-bottom-color: var(--fill-color);\n",
1122
+ " }\n",
1123
+ " }\n",
1124
+ "</style>\n",
1125
+ "\n",
1126
+ " <script>\n",
1127
+ " async function quickchart(key) {\n",
1128
+ " const quickchartButtonEl =\n",
1129
+ " document.querySelector('#' + key + ' button');\n",
1130
+ " quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n",
1131
+ " quickchartButtonEl.classList.add('colab-df-spinner');\n",
1132
+ " try {\n",
1133
+ " const charts = await google.colab.kernel.invokeFunction(\n",
1134
+ " 'suggestCharts', [key], {});\n",
1135
+ " } catch (error) {\n",
1136
+ " console.error('Error during call to suggestCharts:', error);\n",
1137
+ " }\n",
1138
+ " quickchartButtonEl.classList.remove('colab-df-spinner');\n",
1139
+ " quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
1140
+ " }\n",
1141
+ " (() => {\n",
1142
+ " let quickchartButtonEl =\n",
1143
+ " document.querySelector('#df-a409e5a0-8750-4547-8364-5c1ceafc07d7 button');\n",
1144
+ " quickchartButtonEl.style.display =\n",
1145
+ " google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
1146
+ " })();\n",
1147
+ " </script>\n",
1148
+ "</div>\n",
1149
+ "\n",
1150
+ " </div>\n",
1151
+ " </div>\n"
1152
+ ],
1153
+ "text/plain": [
1154
+ " author title\n",
1155
+ "0 Kris Holt Summer Game Fest 2024: What to expect and how ...\n",
1156
+ "1 Ali Rees Get some popcorn ready for an extra-long Xbox ...\n",
1157
+ "2 Ali Rees Leaks suggest we could see a huge Starfield an...\n",
1158
+ "3 Wesley Yin-Poole Microsoft Confirms Xbox Game Pass June 2024 Wa...\n",
1159
+ "4 Wesley Yin-Poole Warzone Has a New Frank Woods Cutscene — Final..."
1160
+ ]
1161
+ },
1162
+ "execution_count": 168,
1163
+ "metadata": {},
1164
+ "output_type": "execute_result"
1165
+ }
1166
+ ],
1167
+ "source": [
1168
+ "import requests\n",
1169
+ "import pandas as pd\n",
1170
+ "from newsapi import NewsApiClient\n",
1171
+ "from datetime import datetime, timedelta\n",
1172
+ "\n",
1173
+ "def fetch_news(query, from_date, to_date, language='en', sort_by='relevancy', page_size=30, api_key='YOUR_API_KEY'):\n",
1174
+ " # Initialize the NewsAPI client\n",
1175
+ " newsapi = NewsApiClient(api_key=api_key)\n",
1176
+ " query = query.replace(' ','&')\n",
1177
+ " # Fetch all articles matching the query\n",
1178
+ " all_articles = newsapi.get_everything(\n",
1179
+ " q=query,\n",
1180
+ " from_param=from_date,\n",
1181
+ " to=to_date,\n",
1182
+ " language=language,\n",
1183
+ " sort_by=sort_by,\n",
1184
+ " page_size=page_size\n",
1185
+ " )\n",
1186
+ "\n",
1187
+ " # Extract articles\n",
1188
+ " articles = all_articles.get('articles', [])\n",
1189
+ "\n",
1190
+ " # Convert to DataFrame\n",
1191
+ " if articles:\n",
1192
+ " df = pd.DataFrame(articles)\n",
1193
+ " return df\n",
1194
+ " else:\n",
1195
+ " return pd.DataFrame() # Return an empty DataFrame if no articles are found\n",
1196
+ "\n",
1197
+ "# Get the current time\n",
1198
+ "current_time = datetime.now()\n",
1199
+ "# Get the time 10 days ago\n",
1200
+ "time_10_days_ago = current_time - timedelta(days=10)\n",
1201
+ "api_key = 'c0e23a8956cf4b54af382abd932f88ff'\n",
1202
+ "q = \"Microsoft News June 2024\"\n",
1203
+ "df = fetch_news(q, time_10_days_ago, current_time, api_key=api_key)\n",
1204
+ "\n",
1205
+ "df_news = df.drop(\"source\", axis=1)\n",
1206
+ "\n",
1207
+ "def preprocess_news_data(df):\n",
1208
+ " # Convert publishedAt to datetime\n",
1209
+ " df['publishedAt'] = pd.to_datetime(df['publishedAt'])\n",
1210
+ " df = df[~df['author'].isna()]\n",
1211
+ " df = df[['author', 'title']]\n",
1212
+ " return df\n",
1213
+ "\n",
1214
+ "preprocessed_news_df = preprocess_news_data(df_news)\n",
1215
+ "preprocessed_news_df.head()"
1216
+ ]
1217
+ },
1218
+ {
1219
+ "cell_type": "markdown",
1220
+ "metadata": {
1221
+ "id": "VUf3dwOp1hJb"
1222
+ },
1223
+ "source": [
1224
+ "### Pre-Processing the Data"
1225
+ ]
1226
+ },
1227
+ {
1228
+ "cell_type": "code",
1229
+ "execution_count": null,
1230
+ "metadata": {
1231
+ "id": "BN5FV809Gtsk"
1232
+ },
1233
+ "outputs": [],
1234
+ "source": [
1235
+ "def build_prompt(news_df):\n",
1236
+ " prompt = \"You are a financial analyst tasked with providing insights into recent news articles related to the financial industry. Here are some recent news articles:\\n\\n\"\n",
1237
+ "\n",
1238
+ " for index, row in news_df.iterrows():\n",
1239
+ " title = row['title']\n",
1240
+ " prompt += f\" **News:** {title}\\n\\n\"\n",
1241
+ "\n",
1242
+ " prompt += \"Please analyze these articles and provide insights into any potential impacts on the financial industry Sentiment on the provided company.\"\n",
1243
+ "\n",
1244
+ " return prompt\n",
1245
+ "\n",
1246
+ "# Build the prompt\n",
1247
+ "prompt = build_prompt(preprocessed_news_df)\n",
1248
+ "print(prompt)"
1249
+ ]
1250
+ },
1251
+ {
1252
+ "cell_type": "markdown",
1253
+ "metadata": {
1254
+ "id": "U9bpPeaE1jQi"
1255
+ },
1256
+ "source": [
1257
+ "### LLM from Hugging Face Open Source"
1258
+ ]
1259
+ },
1260
+ {
1261
+ "cell_type": "code",
1262
+ "execution_count": null,
1263
+ "metadata": {
1264
+ "id": "ugBKl98nHAwA"
1265
+ },
1266
+ "outputs": [],
1267
+ "source": [
1268
+ "llm = HuggingFaceHub(\n",
1269
+ " repo_id=\"tiiuae/falcon-7b-instruct\",\n",
1270
+ " model_kwargs={\"temperature\": 0.1},\n",
1271
+ ")"
1272
+ ]
1273
+ },
1274
+ {
1275
+ "cell_type": "code",
1276
+ "execution_count": null,
1277
+ "metadata": {
1278
+ "colab": {
1279
+ "base_uri": "https://localhost:8080/",
1280
+ "height": 885
1281
+ },
1282
+ "id": "OV_xMxsJHhhD",
1283
+ "outputId": "52525e65-b43a-415f-e0cf-5b5415220c20"
1284
+ },
1285
+ "outputs": [
1286
+ {
1287
+ "data": {
1288
+ "text/markdown": [
1289
+ "You are a financial analyst tasked with providing insights into recent news articles related to the financial industry. Here are some recent news articles:\n",
1290
+ "\n",
1291
+ " **News:** Summer Game Fest 2024: What to expect and how to watch games revealed live\n",
1292
+ "\n",
1293
+ " **News:** Get some popcorn ready for an extra-long Xbox Games June Showcase\n",
1294
+ "\n",
1295
+ " **News:** Leaks suggest we could see a huge Starfield announcement at Xbox Games Showcase\n",
1296
+ "\n",
1297
+ " **News:** Microsoft Confirms Xbox Game Pass June 2024 Wave 1 Lineup\n",
1298
+ "\n",
1299
+ " **News:** Warzone Has a New Frank Woods Cutscene — Finally Making a Crucial Moment in Call of Duty Black Ops Lore Canon\n",
1300
+ "\n",
1301
+ " **News:** WWDC 2024: What We're Expecting and How to Watch Apple's iOS 18 Event - CNET\n",
1302
+ "\n",
1303
+ " **News:** How to watch Intel’s big Computex 2024 keynote tonight\n",
1304
+ "\n",
1305
+ " **News:** How to watch Summer Game Fest 2024 — Not-E3, Xbox Games Showcase, Call of Duty: Black Ops 6 Direct, Wholesome Direct, and more\n",
1306
+ "\n",
1307
+ " **News:** Report: Microsoft is 'considering' bringing its flagship Xbox IP to PlayStation for the first time, but will it?\n",
1308
+ "\n",
1309
+ " **News:** Destiny 2 Developer Bungie ‘Truly Sorry’ for The Final Shape Launch Issues\n",
1310
+ "\n",
1311
+ " **News:** NVIDIA Splits 10-to-1; Non-farm Payrolls on Deck for Friday\n",
1312
+ "\n",
1313
+ " **News:** A PR disaster: Microsoft has lost trust with its users, and Windows Recall is the straw that broke the camel's back\n",
1314
+ "\n",
1315
+ " **News:** Sony Removes 8K Claim From PlayStation 5 Boxes\n",
1316
+ "\n",
1317
+ " **News:** Engadget Podcast: How AI will shape Apple's WWDC 2024\n",
1318
+ "\n",
1319
+ " **News:** This Week in Security: Recall, Modem Mysteries, and Flipping Pages\n",
1320
+ "\n",
1321
+ " **News:** Wholesome Pokemon-like \"Creatures of Ava\" shows off a new trailer, with a playable demo coming soon\n",
1322
+ "\n",
1323
+ " **News:** Microsoft Copilot Plus hands-on: Does it need a Recall?\n",
1324
+ "\n",
1325
+ " **News:** Surface Laptop 7 vs. Samsung Galaxy Book4 Edge: Which high-end Copilot+ PC works better for you?\n",
1326
+ "\n",
1327
+ " **News:** Nvidia was officially more valuable than Apple — for a couple of hours, at least\n",
1328
+ "\n",
1329
+ " **News:** New Windows 10 update gives it Windows 11’s photo-sharing capabilities with Android devices – but you might want to hang on\n",
1330
+ "\n",
1331
+ " **News:** Russian Influence Campaign Targeting Paris Olympics, Microsoft Warns\n",
1332
+ "\n",
1333
+ " **News:** iOS 18 is coming next week: Here’s everything we know\n",
1334
+ "\n",
1335
+ " **News:** Nvidia app beta offers warranty-safe GPU tuning and improved stream recording\n",
1336
+ "\n",
1337
+ " **News:** Elon Musk Is Hurting Tesla To Help Twitter and xAI\n",
1338
+ "\n",
1339
+ " **News:** Bill Gates Could Be The World's First Trillionaire If He Had 'Diamond Handed' His Microsoft Shares — He'd Be Sitting On $1.47 Trillion Today\n",
1340
+ "\n",
1341
+ " **News:** Nvidia stock crosses $3 trillion market cap, overtakes Apple as second-largest co. in US market\n",
1342
+ "\n",
1343
+ " **News:** Adafruit Weekly Editorial Round-Up: AANHPI Month, National Paper Airplane Day, Adafruit TRRS Trinkey & more!\n",
1344
+ "\n",
1345
+ " **News:** Apple WWDC 2024: get ready for lots of AI news\n",
1346
+ "\n",
1347
+ " **News:** Microsoft Issues New Warning For 70% Of All Windows Users\n",
1348
+ "\n",
1349
+ " **News:** Microsoft is again named the overall leader in the Forrester Wave for XDR\n",
1350
+ "\n",
1351
+ "Please analyze these articles and provide insights into any potential impacts on the financial industry Sentiment on the provided company.\n",
1352
+ "1. Microsoft's recent news articles regarding the Xbox Games Showcase and the upcoming Windows 11 update have been generally positive, with a focus on the company's continued push towards the gaming industry. This could potentially lead to increased sales and revenue for Microsoft, as well as increased brand awareness and loyalty among consumers.\n",
1353
+ "\n",
1354
+ "2. The news articles related to the new Starfield game from Bethesda have been generating a lot of buzz and excitement among gamers. The game's release date has been"
1355
+ ],
1356
+ "text/plain": [
1357
+ "<IPython.core.display.Markdown object>"
1358
+ ]
1359
+ },
1360
+ "execution_count": 171,
1361
+ "metadata": {},
1362
+ "output_type": "execute_result"
1363
+ }
1364
+ ],
1365
+ "source": [
1366
+ "Markdown(llm.invoke(prompt))"
1367
+ ]
1368
+ },
1369
+ {
1370
+ "cell_type": "markdown",
1371
+ "metadata": {
1372
+ "id": "-FdO0jQdLkaW"
1373
+ },
1374
+ "source": [
1375
+ "# **Financial Data Investment Advisor**\n",
1376
+ "\n",
1377
+ "**Problem Statment:** Building a Financial Advisor based on the Data that gathered from various financial advices in dataset from Stocks to mutual funds to gold or silver bonds as well using Python, Langchain and LLM (open source).\n",
1378
+ "\n",
1379
+ "**Project Methodology**\n",
1380
+ "- This Project using the Open Source Data from Kaggle regarding financial advices.\n",
1381
+ "- Using Python, that load data and then pre-processed and saved in CSV File.\n",
1382
+ "- Loading that same CSV file to insert into Vector DB using Embedding Model from Hugging Face.\n",
1383
+ "- Building RAG QA Chain using Langchain and building the RAG architecture using Falcon 7B LLM (Open Source).\n",
1384
+ "- Checking the Response.\n",
1385
+ "\n",
1386
+ "\n",
1387
+ "![](https://media.licdn.com/dms/image/D5612AQFSyeoRrkC5fw/article-cover_image-shrink_720_1280/0/1701189671766?e=2147483647&v=beta&t=cpa6wlGMWG44ZyGW6MWyKZ2Vr0BT-G1zlb8RB0yio6w)"
1388
+ ]
1389
+ },
1390
+ {
1391
+ "cell_type": "markdown",
1392
+ "metadata": {
1393
+ "id": "c0IWX7yB1nAp"
1394
+ },
1395
+ "source": [
1396
+ "## **Loading the Financial Data from Kaggle or Any Open Source Platform**\n",
1397
+ "\n",
1398
+ "Data Source - https://www.kaggle.com/datasets/nitindatta/finance-data"
1399
+ ]
1400
+ },
1401
+ {
1402
+ "cell_type": "code",
1403
+ "execution_count": null,
1404
+ "metadata": {
1405
+ "id": "RfpUWvICHqSG"
1406
+ },
1407
+ "outputs": [],
1408
+ "source": [
1409
+ "data = pd.read_csv(\"Finance_data.csv\")\n",
1410
+ "data_fin = data.to_dict(orient='records')"
1411
+ ]
1412
+ },
1413
+ {
1414
+ "cell_type": "code",
1415
+ "execution_count": null,
1416
+ "metadata": {
1417
+ "id": "HwgRMUJ7LLup"
1418
+ },
1419
+ "outputs": [],
1420
+ "source": [
1421
+ "for entry in data_fin:\n",
1422
+ " prompt = f\"I'm a {entry['age']}-year-old {entry['gender']} looking to invest in {entry['Avenue']} for {entry['Purpose']} over the next {entry['Duration']}. What are my options?\"\n",
1423
+ " print(prompt)"
1424
+ ]
1425
+ },
1426
+ {
1427
+ "cell_type": "markdown",
1428
+ "metadata": {
1429
+ "id": "FBaVWGci1zce"
1430
+ },
1431
+ "source": [
1432
+ "### Pre-Processng the Data into Prompt-Response Format"
1433
+ ]
1434
+ },
1435
+ {
1436
+ "cell_type": "code",
1437
+ "execution_count": null,
1438
+ "metadata": {
1439
+ "colab": {
1440
+ "base_uri": "https://localhost:8080/"
1441
+ },
1442
+ "id": "Sv9VRzBDKjHM",
1443
+ "outputId": "30f59782-949f-4966-8ed6-ae5e529babc3"
1444
+ },
1445
+ "outputs": [
1446
+ {
1447
+ "data": {
1448
+ "text/plain": [
1449
+ "[{'prompt': \"I'm a 34-year-old Female looking to invest in Mutual Fund for Wealth Creation over the next 1-3 years. What are my options?\",\n",
1450
+ " 'response': 'Based on your preferences, here are your investment options:\\n- Mutual Funds: 1\\n- Equity Market: 2\\n- Debentures: 5\\n- Government Bonds: 3\\n- Fixed Deposits: 7\\n- PPF: 6\\n- Gold: 4\\nFactors considered: Returns\\nObjective: Capital Appreciation\\nExpected returns: 20%-30%\\nInvestment monitoring: Monthly\\nReasons for choices:\\n- Equity: Capital Appreciation\\n- Mutual Funds: Better Returns\\n- Bonds: Safe Investment\\n- Fixed Deposits: Fixed Returns\\nSource of information: Newspapers and Magazines\\n'},\n",
1451
+ " {'prompt': \"I'm a 23-year-old Female looking to invest in Mutual Fund for Wealth Creation over the next More than 5 years. What are my options?\",\n",
1452
+ " 'response': 'Based on your preferences, here are your investment options:\\n- Mutual Funds: 4\\n- Equity Market: 3\\n- Debentures: 2\\n- Government Bonds: 1\\n- Fixed Deposits: 5\\n- PPF: 6\\n- Gold: 7\\nFactors considered: Locking Period\\nObjective: Capital Appreciation\\nExpected returns: 20%-30%\\nInvestment monitoring: Weekly\\nReasons for choices:\\n- Equity: Dividend\\n- Mutual Funds: Better Returns\\n- Bonds: Safe Investment\\n- Fixed Deposits: High Interest Rates\\nSource of information: Financial Consultants\\n'},\n",
1453
+ " {'prompt': \"I'm a 30-year-old Male looking to invest in Equity for Wealth Creation over the next 3-5 years. What are my options?\",\n",
1454
+ " 'response': 'Based on your preferences, here are your investment options:\\n- Mutual Funds: 3\\n- Equity Market: 6\\n- Debentures: 4\\n- Government Bonds: 2\\n- Fixed Deposits: 5\\n- PPF: 1\\n- Gold: 7\\nFactors considered: Returns\\nObjective: Capital Appreciation\\nExpected returns: 20%-30%\\nInvestment monitoring: Daily\\nReasons for choices:\\n- Equity: Capital Appreciation\\n- Mutual Funds: Tax Benefits\\n- Bonds: Assured Returns\\n- Fixed Deposits: Fixed Returns\\nSource of information: Television\\n'},\n",
1455
+ " {'prompt': \"I'm a 22-year-old Male looking to invest in Equity for Wealth Creation over the next Less than 1 year. What are my options?\",\n",
1456
+ " 'response': 'Based on your preferences, here are your investment options:\\n- Mutual Funds: 2\\n- Equity Market: 1\\n- Debentures: 3\\n- Government Bonds: 7\\n- Fixed Deposits: 6\\n- PPF: 4\\n- Gold: 5\\nFactors considered: Returns\\nObjective: Income\\nExpected returns: 10%-20%\\nInvestment monitoring: Daily\\nReasons for choices:\\n- Equity: Dividend\\n- Mutual Funds: Fund Diversification\\n- Bonds: Tax Incentives\\n- Fixed Deposits: High Interest Rates\\nSource of information: Internet\\n'},\n",
1457
+ " {'prompt': \"I'm a 24-year-old Female looking to invest in Equity for Wealth Creation over the next Less than 1 year. What are my options?\",\n",
1458
+ " 'response': 'Based on your preferences, here are your investment options:\\n- Mutual Funds: 2\\n- Equity Market: 1\\n- Debentures: 3\\n- Government Bonds: 6\\n- Fixed Deposits: 4\\n- PPF: 5\\n- Gold: 7\\nFactors considered: Returns\\nObjective: Income\\nExpected returns: 20%-30%\\nInvestment monitoring: Daily\\nReasons for choices:\\n- Equity: Capital Appreciation\\n- Mutual Funds: Better Returns\\n- Bonds: Safe Investment\\n- Fixed Deposits: Risk Free\\nSource of information: Internet\\n'}]"
1459
+ ]
1460
+ },
1461
+ "execution_count": 188,
1462
+ "metadata": {},
1463
+ "output_type": "execute_result"
1464
+ }
1465
+ ],
1466
+ "source": [
1467
+ "# Convert the data to prompt-response format\n",
1468
+ "prompt_response_data = []\n",
1469
+ "for entry in data_fin:\n",
1470
+ " prompt = f\"I'm a {entry['age']}-year-old {entry['gender']} looking to invest in {entry['Avenue']} for {entry['Purpose']} over the next {entry['Duration']}. What are my options?\"\n",
1471
+ " response = (\n",
1472
+ " f\"Based on your preferences, here are your investment options:\\n\"\n",
1473
+ " f\"- Mutual Funds: {entry['Mutual_Funds']}\\n\"\n",
1474
+ " f\"- Equity Market: {entry['Equity_Market']}\\n\"\n",
1475
+ " f\"- Debentures: {entry['Debentures']}\\n\"\n",
1476
+ " f\"- Government Bonds: {entry['Government_Bonds']}\\n\"\n",
1477
+ " f\"- Fixed Deposits: {entry['Fixed_Deposits']}\\n\"\n",
1478
+ " f\"- PPF: {entry['PPF']}\\n\"\n",
1479
+ " f\"- Gold: {entry['Gold']}\\n\"\n",
1480
+ " f\"Factors considered: {entry['Factor']}\\n\"\n",
1481
+ " f\"Objective: {entry['Objective']}\\n\"\n",
1482
+ " f\"Expected returns: {entry['Expect']}\\n\"\n",
1483
+ " f\"Investment monitoring: {entry['Invest_Monitor']}\\n\"\n",
1484
+ " f\"Reasons for choices:\\n\"\n",
1485
+ " f\"- Equity: {entry['Reason_Equity']}\\n\"\n",
1486
+ " f\"- Mutual Funds: {entry['Reason_Mutual']}\\n\"\n",
1487
+ " f\"- Bonds: {entry['Reason_Bonds']}\\n\"\n",
1488
+ " f\"- Fixed Deposits: {entry['Reason_FD']}\\n\"\n",
1489
+ " f\"Source of information: {entry['Source']}\\n\"\n",
1490
+ " )\n",
1491
+ " prompt_response_data.append({\"prompt\": prompt, \"response\": response})\n",
1492
+ "\n",
1493
+ "prompt_response_data[:5]"
1494
+ ]
1495
+ },
1496
+ {
1497
+ "cell_type": "markdown",
1498
+ "metadata": {
1499
+ "id": "0nFwqS8q12qa"
1500
+ },
1501
+ "source": [
1502
+ "### Storing Data into Vector DB"
1503
+ ]
1504
+ },
1505
+ {
1506
+ "cell_type": "code",
1507
+ "execution_count": null,
1508
+ "metadata": {
1509
+ "id": "x-R2xXZzMV-7"
1510
+ },
1511
+ "outputs": [],
1512
+ "source": [
1513
+ "from langchain.docstore.document import Document\n",
1514
+ "documents = []\n",
1515
+ "for entry in prompt_response_data:\n",
1516
+ " combined_text = f\"Prompt: {entry['prompt']}\\nResponse: {entry['response']}\"\n",
1517
+ " documents.append(Document(page_content=combined_text))"
1518
+ ]
1519
+ },
1520
+ {
1521
+ "cell_type": "code",
1522
+ "execution_count": null,
1523
+ "metadata": {
1524
+ "id": "sHxeWfw7OSFF"
1525
+ },
1526
+ "outputs": [],
1527
+ "source": [
1528
+ "text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=10)\n",
1529
+ "texts = text_splitter.split_documents(documents)"
1530
+ ]
1531
+ },
1532
+ {
1533
+ "cell_type": "code",
1534
+ "execution_count": null,
1535
+ "metadata": {
1536
+ "id": "sgj7HDzoOHmx"
1537
+ },
1538
+ "outputs": [],
1539
+ "source": [
1540
+ "from langchain.vectorstores import Chroma\n",
1541
+ "persist_directory = 'docs/chroma/'\n",
1542
+ "vectordb_fin = Chroma.from_documents(\n",
1543
+ " documents=texts,\n",
1544
+ " embedding=hg_embeddings,\n",
1545
+ " persist_directory=persist_directory\n",
1546
+ ")"
1547
+ ]
1548
+ },
1549
+ {
1550
+ "cell_type": "markdown",
1551
+ "metadata": {
1552
+ "id": "-GyrpvYx16I1"
1553
+ },
1554
+ "source": [
1555
+ "### Building RAG System using VectorDB and LLM"
1556
+ ]
1557
+ },
1558
+ {
1559
+ "cell_type": "code",
1560
+ "execution_count": null,
1561
+ "metadata": {
1562
+ "colab": {
1563
+ "base_uri": "https://localhost:8080/"
1564
+ },
1565
+ "id": "-_37Y0BlOfTL",
1566
+ "outputId": "ad6e2f57-bc6c-4481-c35a-84f35eae0a2f"
1567
+ },
1568
+ "outputs": [
1569
+ {
1570
+ "data": {
1571
+ "text/plain": [
1572
+ "{'query': \"I'm a 34-year-old female looking to invest in mutual funds for wealth creation over the next 1-3 years. What are my options?\",\n",
1573
+ " 'result': \"Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\\n\\nPrompt: I'm a 34-year-old Female looking to invest in Mutual Fund for Wealth Creation over the next\\n\\nPrompt: I'm a 32-year-old Female looking to invest in Mutual Fund for Wealth Creation over the next\\n\\nPrompt: I'm a 28-year-old Female looking to invest in Mutual Fund for Wealth Creation over the next\\n\\nPrompt: I'm a 24-year-old Female looking to invest in Mutual Fund for Wealth Creation over the next\\n\\nPrompt: I'm a 29-year-old Male looking to invest in Mutual Fund for Wealth Creation over the next\\n\\nQuestion: I'm a 34-year-old female looking to invest in mutual funds for wealth creation over the next 1-3 years. What are my options?\\nHelpful Answer:\\n\\nAs a 34-year-old female, there are several options available for investing in mutual funds for wealth creation over the next 1-3 years. Some of the best options include:\\n\\n1. Diversify your portfolio: Invest in a mix of stocks, bonds, and other assets to spread your risk and potentially increase your returns.\\n\\n2. Consider a robo-advisor: These online platforms can help you create a personalized investment plan based on your goals,\"}"
1574
+ ]
1575
+ },
1576
+ "execution_count": 198,
1577
+ "metadata": {},
1578
+ "output_type": "execute_result"
1579
+ }
1580
+ ],
1581
+ "source": [
1582
+ "from langchain.chains import RetrievalQA\n",
1583
+ "retriever_fin = vectordb_fin.as_retriever(search_kwargs={\"k\":5})\n",
1584
+ "qa = RetrievalQA.from_chain_type(\n",
1585
+ " llm=llm, chain_type=\"stuff\", retriever=retriever_fin, return_source_documents=False)\n",
1586
+ "query = \"I'm a 34-year-old female looking to invest in mutual funds for wealth creation over the next 1-3 years. What are my options?\"\n",
1587
+ "result = qa({\"query\": query})\n",
1588
+ "result"
1589
+ ]
1590
+ },
1591
+ {
1592
+ "cell_type": "code",
1593
+ "execution_count": null,
1594
+ "metadata": {
1595
+ "id": "3B3iZomXk2_-"
1596
+ },
1597
+ "outputs": [],
1598
+ "source": []
1599
+ }
1600
+ ],
1601
+ "metadata": {
1602
+ "colab": {
1603
+ "provenance": []
1604
+ },
1605
+ "kernelspec": {
1606
+ "display_name": "Python 3",
1607
+ "name": "python3"
1608
+ },
1609
+ "language_info": {
1610
+ "name": "python"
1611
+ }
1612
+ },
1613
+ "nbformat": 4,
1614
+ "nbformat_minor": 0
1615
+ }